1 Chapter 2: How many outsourced workers are there in the UK?
1.1 How many UK workers are outsourced?
#how-many
Around 1 in 6 UK workers meet our definition of an outsourced worker
The ‘outsourced sub-group’ is the most dominant of the three sub-groups - meaning the total group is predominantly made up of people who self-identify as an outsourced worker and they say they are hired to do work that is long-term or ongoing. People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 67% (check) of our total outsourced group, or nearly 7 in 10. This group makes up X of all UK workers.
In terms of the the different possible types of outsourced groups2, the numbers are as follows:
Definitely outsourced: 11%
Likely agency: 3%
High indicators: 3%
People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 68% of our total outsourced group. This group makes up 11% of all UK workers.
#non-exclusive-subgroups1
The two other sub-groups – the agency and indicators sub-groups – are less dominant in comparison. Around 58% of all respondents meet the criteria for either or both of these sub-groups, but this falls to around 33% if we exclude people who are already captured in the outsourced sub-group. Excluding the first sub-group, these other two groups makes up X of all UK workers.
The percentages here refer to the number of people who are outsourced (super-ordinate group), not the total number of respondents. Below I provide percentages as function of the outsourced super-ordinate group as well as the total sample
Group criteria
Outsourced, defined as responding ‘I am sure I am outsourced’ or ‘I might be outsourced’, and responding ‘I do work on a long-term basis’.
Likely agency, defined as those responding ‘I am sure I am agency’ and ‘I do work on a long-term basis’, excluding those people who are already defined as being outsourced.
High indicators: defined as responding TRUE to 5 or 6 of the outsourcing indicators, as well as responding ‘I do work on a long-term basis’, excluding those people who are already defined as outsourced or likely agency.
Including outsourced group
agency_or_indicator
freq
n
total
perc
N
agency
342.6956
344
10155
3.374649
10155
both
106.3656
116
10155
1.047421
10155
indicator
513.2645
516
10155
5.054303
10155
neither
9192.6744
9179
10155
90.523627
10155
Exluding outsourced group
agency_or_indicator
freq
n
total
perc
N
agency
231.43068
231
8993.922
2.5731897
9032
both
35.10624
38
8993.922
0.3903329
9032
indicator
280.74106
291
8993.922
3.1214531
9032
neither
8446.64421
8472
8993.922
93.9150243
9032
9.48% of the whole sample meet the criteria for either or both of these sub-groups. This falls to 6.08% if we exclude people who are already captured in the outsourced sub-group.
Out of those who are in the ‘outsourced’ status (i.e., the combination of the three outsourced groups), 57.99% meet the criteria for either or both of these sub-groups, but this falls to around 33.27% if we exclude people who are already captured in the outsourced sub-group.
#non-exclusive-subgroups2
There is some overlap between these sub-groups, but they are not like for like. Just over a quarter (27%) of respondents are in more than one sub-group, while nearly three quarters (73%) of respondents are uniquely captured in just one of the three sub-groups.
Just over a quarter (26.35%) of respondents are in more than one sub-group, while nearly three quarters (73.65%) of respondents are uniquely captured in just one of the three sub-groups.3
1.2 Evaluating our total estimate
#evaluating-total-estimate To do
Around 1 in 4 “outsourced” respondents sit in more than one sub-group within our definition, but around 3 in 4 are uniquely captured in just one of the three sub-groups - predominantly in the outsourced sub-group.
As figure X shows, not all respondents in the outsourced sub-group said yes five or six of our six outsourcing
2.2 Evidence paints a racialised picture of outsourcing in the UK, with links to both ethnicity and migration
#ethnicity
More than 1 in 4 (nearly 1/3) outsourced workers are from an ethnic minority background
Workers from ethnic minority backgrounds are disproportionately over-represented in outsourced work in the UK, and typically more likely to be outsourced than White British workers.
Overall, 22% of non-outsourced workers are from an ethnic minority background, rising to 33% of outsourced workers – a more than ten percentage point difference. This means that while just over 1 in 6 non-outsourced workers in our sample were from an ethnic minority background, nearly 1 in 3 outsourced workers were.
People from an ethnic minority background are overall 1.75 times more likely to be outsourced than people from a White British background.
Workers from Arab backgrounds are 3.86 times more likely than White workers to be outsourced; (check sample size – are we confident in all of these significance tests, or should we just use some of them in these bullet points?)
Workers from Black backgrounds are 2.33 times more likely than White workers to be outsourced.
Workers from Asian backgrounds are 1.98 times more likely than White workers to be outsourced
Workers from Mixed Ethnicity backgrounds are 1.86 times more likely than White workers to be outsourced
White other worksers are 1.30 times more likely than White British workers to be outsourced
People from an ethnic minority are 1.75 times more likely to be outsourced than people from a White British background; 33.09% of outsourced workers are from an ethnic minority, compared to 21.99% of non-outsourced workers.16
Overall, there is no interaction between being from a minority and outsourced on whether you are low paid. i.e., being from an ethnic minority and outsourced is not associated with being in the low pay group.17
However there is nuance in the groups. There is evidence to suggest that people who are Black and outsourced are less likely to be in the high income group (OR = 0.44x). People who are from an ’other ethnic group and outsourced are more likely to be in the high income group (OR = 18.8x!) (see tables in Section 2.1)
Ethnicity (binary) by outsourcing status and income group(%)
outsourcing_status
income_group
White British
Non-White British
Not outsourced
Not low
78.53
21.47
Not outsourced
Low
80.18
19.82
Outsourced
Not low
64.95
35.05
Outsourced
Low
68.67
31.33
Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others18:
Arab/British Arab workers are 3.386 times more likely than White British workers to be outsourced.
Asian/Asian British workers are 1.982 times more likely than White British workers to be outsourced.
Black/African/Caribbean/Black British workers are 2.334 times more likely than White British workers to be outsourced.
Mixed/Multiple ethnic group workers are 1.865 times more likely than White British workers to be outsourced.
Prefer not to say workers are 1.389 times more likely than White British workers to be outsourced.
White other workers are 1.296 times more likely than White British workers to be outsourced.
Comparison of more disaggregated ethnicities indicates more nuance19:
Any other White background workers are 1.41 times more likely than White British workers to be outsourced.
White and Black African workers are 4.12 times more likely than White British workers to be outsourced.
Any other Mixed / Multiple ethnic background workers are 2.73 times more likely than White British workers to be outsourced.
Indian workers are 1.79 times more likely than White British workers to be outsourced.
Pakistani workers are 3.23 times more likely than White British workers to be outsourced.
Bangladeshi workers are 2.48 times more likely than White British workers to be outsourced.
Any other Asian background workers are 2.18 times more likely than White British workers to be outsourced.
African workers are 2.57 times more likely than White British workers to be outsourced.
Any other Black, Black British, or Caribbean background workers are 2.65 times more likely than White British workers to be outsourced.
Arab workers are 3.39 times more likely than White British workers to be outsourced.
#ethnicity-sub-group
These differences in ethnicity also shift slightly depending on which outsourced “sub-group” we look at. For example, compared to White British workers, Black outsourced workers are more likely to be in the “outsourced sub-group” meaning they have self-identified as outsourced, or the “agency sub-group”, meaning they are agency workers doing more long-term and ongoing work. Are there any other interesting points to mention here? Should we do a chart showing this different across sub-groups? Do we need an interpretive comment in this section?
# weights: 36 (24 variable)
initial value 14077.819237
iter 10 value 6007.123166
iter 20 value 5985.860309
iter 30 value 5985.677969
final value 5985.677504
converged
Breaking down by outsourcing group helps to separate out the type of outsourced work people from the ethnicities identified above engage in.20 Compared to White British workers,
Arab people are more likely to be likely agency or high indicators
Asian people are more likely to be in any of the groups
Black people are more likely to be likely agency or outsourced
People of mixed ethnicity are more likely to be outsourced
People who selected Other ethnicity are more likely to be agency
White other people are more likely to be outsourced
# weights: 88 (63 variable)
initial value 13604.387456
iter 10 value 5752.921034
iter 20 value 5738.642702
iter 30 value 5738.326928
iter 40 value 5738.207808
iter 50 value 5738.195963
final value 5738.195716
converged
More nuance from disaggregated ethnicities21. The table below shows the likelihood of workers of different ethnicities falling into each of the outsourcing groups, compared to White British workers. Note that only significant relationships are shown here. Note also that the ‘n’ for many of these statistics is very low. As such many of these statistics are illustrative but not inferential.
Likelihood of belonging to different groups compared to White British. Note: NAs are non-sig. relationships. 'n_' is sample size, 'freq_' is weighted sample size
Ethnicity
Outsourced
Likely agency
High indicators
n_Outsourced
n_Likely agency
n_High indicators
freq_Outsourced
freq_Likely agency
freq_High indicators
Gypsy or Irish Traveller
NA
0.00
0.00
2
NA
NA
2.48
NA
NA
Any other White background
1.59
NA
NA
63
10
7
72.25
13.33
8.37
White and Black African
4.59
NA
NA
21
2
3
11.08
0.91
2.62
Any other Mixed / Multiple ethnic background
NA
4.87
NA
15
5
3
9.84
4.33
1.71
Indian
1.57
NA
2.64
32
8
15
43.96
11.83
18.18
Pakistani
2.88
3.83
4.11
29
8
12
32.69
9.74
11.43
Bangladeshi
2.84
NA
NA
15
3
3
17.95
2.61
2.48
Any other Asian background
2.17
2.66
NA
17
5
4
30.35
8.34
6.10
African
2.54
3.09
NA
74
22
15
47.20
12.82
9.93
Any other Black, Black British, or Caribbean background
3.13
NA
NA
13
1
2
9.46
1.16
1.16
Arab
NA
6.30
6.15
3
2
2
4.97
3.42
3.63
Any other ethnic group
NA
6.35
NA
1
1
1
1.52
3.93
1.60
Don’t think of myself as any of these
NA
NA
0.00
4
1
NA
2.54
0.40
NA
Prefer not to say
NA
NA
6.94
1
1
2
1.67
0.52
4.72
#ethnicity-pay-split
On the low-pay / high-pay split, you say “A person is more likely to be in the low income group if they are: Older; Female; Prefer not to say when they arrived, And less likely if they are: Asian/Asian British; Live in North West or Wales; Arrived in the UK in last 30 years”; Can I confirm this means we don’t see any other significant differences in the ethnicity breakdown if we look at high paid vs low paid workers? If so, let’s clarify what this says about how ethnicity relates to a) outsourced workers being disproportionately low paid, but b) ethnic minority workers being no more likely to be in our low pay group.
Using the new ethnicity groupings, there is no evidence indicating that any ethnicity is more or less likely to be in the low income group
Note to self: This could benefit from stepwise regression
A person is more likely to be in the low income group if they are:
Older
Female
Don’t have a degree (or don’t know if they have a degree?)
Are outsourced
Arrived in the UK in the last year
And less likely if they are:
Younger
Male
Have a degree
Live in the North West or Wales (compared to London)
Arrived in the UK in last 30 years
#migration
As you would expect, the vast majority of outsourced workers were born in the UK. However, we still see a significantly higher likelihood of outsourced workers having been born outside of the UK compared to people who aren’t outsourced. While around 14% of non-outsourced workers were born outside of the UK, this rose to just over 24% for outsourced workers – or nearly 1 in 4.
Overall, people who were born outside of the UK are 1.94 times more likely to be in outsourced work than people who were born here.
As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. 24.13% of outsourced workers are not born in the UK, compared to 14.08% of non-outsourced workers.22 This difference is statistically significant; outsourced workers are 1.94 times more likely to have been born outside the UK than non-outsourced workers.23
#migration-sub-groups
This pattern broadly holds across our three outsourcing sub-groups, with nearly no difference in the likelihood of people born outside of the UK being in any one of the three groups.
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 6002.136126
final value 6002.013178
converged
#ethnicity-migration-interaction. Some attention needed here
Among all workers who were born in the UK:
Black workers are 2.01 times more likely to be outsourced than a White worker
Asian workers are 2.02 times more likely to be outsourced than a White worker.
Workers from Other ethnic backgrounds are X times more likely to be outsourced than a White other worker
For workers born outside of the UK:
Among White workers, someone not born in the UK is 1.82 times more likely to be outsourced than someone born in the UK.
Among workers from Mixed ethnic backgrounds, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.
Among Other workers, someone not born in the UK is 0.13 times more likely to be outsourced than someone born in the UK.
For workers from other ethnicities, it doesn’t matter whether you are born in the UK or not – you are equally likely as a Black or an Asian worker to be outsourced, whether you were born in the UK or somewhere else. And compared to a White person born in the UK, Black African and South Asian workers specifically are more likely to be outsourced, whether or not they were born in the UK . Does this need any further detail or explanation
To discuss confidence in our interpretation in this section: The evidence on ethnicity and country of birth clearly paints a racialised picture of outsourcing, and one with colonial undertones, as Black African and South Asian workers see a higher risk of being outsourced compared to White British workers, regardless of their country of birth. This obviously raises further questions about why, linked to (sector, occupation, labour market inequality and structural racism). Discuss the draft interpretation in the comment on the right.
However, workers from non-White ethnic groups are not the only workers who see a higher risk of being outsourced: Non-UK-born White workers are also more likely to be outsourced than UK-born White people . Ethnicity and country of birth interact independently for some groups, but seem to be fundamentally connected for others.
Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.24 The plot below shows that
Among workers born in the UK, a Black worker is 2.01 times more likely to be outsourced than a White British worker.
Among workers born in the UK, an Asian worker is 2.03 times more likely to be outsourced than a White British worker.
Among workers born in the UK, an Other ethnicity worker is 4.31 times more likely to be outsourced than a White other worker.
Among workers not born in the UK, a White other worker is 0.58 times as likely (i.e., less likely) to be outsourced than a White British worker.
Among workers not born in the UK, a White other worker is 0.52 times as likely (i.e., less likely) to be outsourced than a Black worker.
Among workers not born in the UK, a White other worker is 0.36 times as likely (i.e., less likely) to be outsourced than a worker of mixed ethnicity.
Among White British workers, someone not born in the UK is 2.48 times more likely to be outsourced than someone born in the UK.
Among Mixed workers, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.
Among Other ethnicity workers, someone not born in the UK is 0.13 times as likely (i.e.,87% less likely) to be outsourced than someone born in the UK.
Among people who preferred not to say their ethnicity, someone not born in the UK is 1.95 times as likely (i.e.,-95% less likely) to be outsourced than someone born in the UK.
#migration-by-pay-split
If we do a basic “born UK / not born UK” split, looking by low and high pay, what % of the low-paid workers group were born outside of the UK, vs in the high-paid group?
20.96% of outsourced workers in the low pay group were not born in the UK, compared to 26.39% of people in the not low pay group. This difference is marginally statistically significant; someone in the low income group is less likely to be born outside the UK than someone in the not low income group. This pattern is the same for non outsourced workers, and when we consider the interaction between outsourcing status and migration status, the only factor predicting income group is outsourcing status.
2.3 Outsourced workers are on average younger than non-outsourced workers
#age
We find that outsourced workers are significantly younger than non-outsourced workers, on average. The median age of an outsourced worker is 35, compared to a median age of 43 for a non-outsourced worker.
the outsourced and indicator sub-groups – people who directly said that they were or might be outsourced, or ticked a high number of our indicators of outsourced working – see higher proportions of younger workers than the “agency” sub-group.
#age-violin
INSERT VIOLIN PLOT CHART HERE SHOWING MEDIAN AGE OF EACH SUB-GROUP, COMPARED TO NON-OUTSOURCED WORKERS. Is this necessary? We already have the density plots
Outsourced workers are on average younger than non-outsourced workers. The median age of the outsourced group is 36 , compared to 43 for the not outsourced group.26 This difference is statistically significant.27
Outsourcing group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
42.80
43
16
80
13.08
8472
Outsourced
38.63
36
16
78
13.07
1683
The higher concentration of younger workers identified above appears to be driven primarily by the ‘outsourced’ and ‘high indicator’ groups, whilst the ‘likely agency’ group follows a similar pattern to the non-outsourced group.28
Outsourcing status
Income group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
Not low
41.97
41
18
78
12.47
5280
Not outsourced
Low
42.87
43
16
80
15.09
1644
Outsourced
Not low
37.96
35
18
77
12.53
986
Outsourced
Low
39.05
37
16
78
14.06
381
Outsourcing group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
42.80
43
16
80
13.08
8472
Outsourced
38.40
35
16
78
13.09
1123
Likely agency
39.80
38
18
77
13.49
269
High indicators
38.49
35
18
72
12.55
291
Outsourcing group
Income group
Mean
Median
Min
Max
Standard dev.
N
Not outsourced
Not low
41.97
41.00
18
78.0
12.47
5280
Not outsourced
Low
42.87
43.00
16
80.0
15.09
1644
Outsourced
Not low
37.81
34.52
18
67.0
12.57
625
Outsourced
Low
39.07
37.00
16
78.0
13.89
272
Likely agency
Not low
39.33
38.00
18
77.0
12.66
168
Likely agency
Low
39.35
37.00
19
71.5
15.66
63
High indicators
Not low
37.29
35.00
18
65.0
12.25
193
High indicators
Low
38.42
34.59
19
67.0
12.82
46
#gender
The evidence also finds meaningful differences by gender between the outsourced and non-outsourced groups in our data. Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce, a nearly 10 percentage point difference.
Outsourced workers are 1.44 times more likely to be male than female.
The group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Comparison of outsourced and non-outsourced workers finds that
Someone in the high indicators sub-group is 2.18 times more likely to be male than female.
Someone in the agency sub-group is 1.45 times more likely to be male than female.
Someone in the outsourced sub-group is 1.31 times more likely to be male than female.
#gender-sector
Possible addition: Will readers want to know more about how this intersects with the roles or sectors with higher rates of outsourcing – even if this is just an interpretive comment from us on how gender interacts with jobs and sectors more generally in the labour market?
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 7610.573378
iter 20 value 7465.550476
final value 7465.517316
converged
The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.29 Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are 1.44 times more likely to be male than female.30
# weights: 20 (12 variable)
initial value 14077.819237
iter 10 value 7977.307669
iter 20 value 7461.899083
iter 30 value 7457.852026
iter 40 value 7457.374598
final value 7457.362521
converged
Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Statistically speaking, compared to a not outsourced person,
Someone in the high indicators group is 2.18 times more likely to be male than female.
Someone in the likely agency group is 1.45 times more likely tobe male than female.
Someone in the outsourced group is 1.31 times more likely tobe male than female.
Additionally, people identifying as ‘Other’ gender are absent from the high indicators and likely agency groups, though given the small N (14) for this group, this finding is unlikely to be meaningful.
2.4 Outsourced workers are more likely to work in some sectors than others; but seem to be spread across the labour market
#sectors
The three most common sectors for outsourced workers in our survey to be employed within – excluding those with an N size below X (50?) – were administrative and support service activities; water supply, sewerage, waste supply and remediation activities; and other service activities
Five of the twenty employment sectors have at least 1 in 5 of their workforce “outsourced”: more than the average of around 17% across the whole workforce.
Here we explore what proportion of workers in each sector are outsourced.31
The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.
The top three Sectors with the highest proportion of outsourced workers are:
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US (note that N = 31)
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
Note that for an undefined sector (‘Not found’) contained one of the largest proportions of outsourced workers (31% of workers in the ‘Not found’ category were outsourced).
A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining… and Extraterritoral organisations… all the way to 36% for Activities of households as employers, with 5 out 20 sectors having at least 20% of their workforce outsourced.
#sectors-ogroup
Figure X also shows how the total outsourced group in each sector splits into our three outsourced “sub-groups”. We find – as you might expect, based on its dominance within the group of outsourced workers – that outsourced workers in every sector are most likely to be in the “outsourced sub-group”, i.e. those who self-identified as outsourced workers.
3 Pay
’#pay
Using regression analysis, we find that outsourced workers are on average paid £2170 less than non-outsourced workers .
The “outsourced sub-group” earns £3,813 less, and the “agency sub-group” £2,603 less, than the non-outsourced group. This finds that pay is lowest in the “outsourced sub-group” of workers, i.e. those who directly identified themselves as being outsourced. Figure X below shows the median and distribution of pay across the three outsourced sub-groups and the non-outsourced group, for comparison.
#pay-violin
Violin plot for the above
The tables and plots below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that outsourced workers are on average paid £2170 less annually than non-outsourced workers.32 Per week, outsourced workers are on average paid £47 less than non-outsourced workers
The tables and plots below show descriptive statistics on income and its distribution for outsrouced groups. Only the full outsourced subgroup has lower income than non-outsourced people. Regression analysis shows that outsourced workers are on average paid £3100 less annually than non-outsourced workers.34 Per week, outsourced workers are on average paid £67 less than non-outsourced workers
This difference increases to £2950 annually (£63 per week) when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. 36 This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the other variables in the model.
Annually:
Men earn £7020 more than women.
People who have a degree earn £8204 more than people without a degree.
Workers in all non-London regions earn less than workers in London
East Midlands: -£5783
East of England: -£4097
North East: -£4862
North West: -£4488
Northern Ireland: -£6569
Scotland: -£5473
South East: -£3420
Wales: -£5384
West Midlands: -£5008
Yorkshire and the Humber: -£5532
People who arrived in the UK within the last year earn £6127 less than people born in the UK
People who arrived in the UK within the last 3 years earn £2423 less than people born in the UK
People who arrived in the UK within the last 5 years earn £2101 less than people born in the UK
People who arrived within the last 30 years earn £3509 more than people born in the UK.
People who have a degree earn £176 more than people without a degree.
Workers in all non-London regions earn less than workers in London
East Midlands: -£124
East of England: -£88
North East: -£104
North West: -£96
Northern Ireland: -£141
Scotland: -£118
South East: -£73
Wales: -£116
West Midlands: -£108
Yorkshire and the Humber: -£119
People who arrived in the UK within the last year earn £132 less than people born in the UK
People who arrived in the UK within the last 3 years earn £52 less than people born in the UK
People who arrived in the UK within the last 5 years earn £45 less than people born in the UK
People who arrived within the last 30 years earn £75 more than people born in the UK.
3.1 Gender pay gap
#gender-pay-gap
On average within our sample, male workers earn £6400 more than female workers per year; but further exploration of how pay relates to gender for outsourced workers suggests that this gender pay gap doesn’t differ in a statistically significant way depending on whether workers are outsourced or not
For female outsourced workers, this suggests that being an outsourced worker neither exacerbates nor diminishes the gender pay gap they face compared to male workers. Check what this controls for
Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £5800.82 less than males. For outsourced workers, females are paid £6399.5 less than males. The difference between non-outsourced and outsourced workers is not significant.
Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £124.63 less than males. For outsourced workers, females are paid £137.5 less than males. The difference between non-outsourced and outsourced workers is not significant.
The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).
A person is more likely to be in the low income group if they are:
Older
Female
Don’t have a degree (or don’t know if they have a degree?)
Are outsourced
Arrived in the UK in the last year
And less likely if they are:
Younger
Male
Have a degree
Live in the North West or Wales (compared to London)
Arrived in the UK in last 30 years
#gender-by-pay-split
Is there already a basic low / high pay split for gender? I know you talk about women being more likely to be in the low-paid group, but again not sure if there is just a basic “women make up x% of low pay group and x% of not low pay group”?
60.34% of outsourced workers in the low pay group were female, compared to 35.85% of outsourced workers in the not low pay group. This difference is statistically significant; women are more likely to be in the low income group. This pattern is the same for non outsourced workers, and there is no interaction effect; irrespective of outsourcing status, women are more likely to be low paid, and irrespective of gender, outsourced people are more likely to be low paid.
#pay-gap-sector
Overall, we find that workers in administrative and support service activities – one of the dominant sectors for outsourced workers in this research – are more likely to be lower-paid than non-outsourced workers in the same sector. The same is true for outsourced water supply (full name; sewerage, waste etc.) workers – another prominent outsourcing sector – information and communication, transportation and storage, and education workers, amongst others. In contrast, we find outsourced workers in financial and insurance activities, for example, appear to be slightly higher paid on average than their non-outsourced counterparts; however, this is one of the few sectors in which this appears to be the case.to be confirmed
I don’t quite understand the chart below the above chart in the file, would you be able to explain it – thanks! Is this the best chart to use, above? Does this need to control for anything else to show us the most accurate analysis of pay by sector for outsourced and non outsourced, or are we confident that this is showing us something notable about sector and pay?
Here we look at Major subgroup occupations within sectors. We only consider the down to ‘Other services’, as the remaining sectors have small n for outsourced group. Note you can find larger images for these plots in outputs/figures/occupation_pay_plots.
The figures indicate there is variation between occupations within sectors in terms of whether outsourced people are paid less or more than non-outsourced workers.
3.2.1 Weekly pay penalty in occupations within sectors
Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:45
Weekly pay penalty for unit occupations within sectors
sector_name_labelled
unit_occupation_labelled
wtd_avg_income_not_outsourced
wtd_avg_income_outsourced
n_not_outsourced
n_outsourced
pay_penalty
Manufacturing
Functional Managers And Directors
796.9414
503.0067
35
14
-293.934661
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles
Functional Managers And Directors
750.7923
535.9459
50
12
-214.846399
Information And Communication
Information Technology Technicians
783.5755
588.3637
16
11
-195.211801
Administrative And Support Service Activities
Customer Service Occupations
437.9813
317.8058
11
10
-120.175519
Administrative And Support Service Activities
Elementary Cleaning Occupations
324.9068
216.1453
15
16
-108.761529
Information And Communication
Information Technology Professionals
935.2414
840.9397
79
26
-94.301622
Education
Teaching Professionals
674.6332
591.5488
283
40
-83.084349
Information And Communication
Functional Managers And Directors
933.1020
864.3060
28
11
-68.796060
Human Health And Social Work Activities
Nursing Professionals
670.7051
609.1528
177
35
-61.552329
Accommodation And Food Service Activities
Other Elementary Services Occupations
336.4167
276.8655
94
32
-59.551254
Education
Teaching And Childcare Support Occupations
335.0658
288.3577
137
26
-46.708093
Transportation And Storage
Road Transport Drivers
606.6916
574.2729
71
19
-32.418643
Human Health And Social Work Activities
Caring Personal Services
424.4531
396.2457
301
81
-28.207324
Human Health And Social Work Activities
Other Health Professionals
662.8731
635.3557
50
13
-27.517406
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles
Road Transport Drivers
524.6526
501.1772
37
11
-23.475416
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles
Shopkeepers And Sales Supervisors
454.6491
434.3528
55
20
-20.296337
Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles
Elementary Storage Occupations
453.8159
445.8018
53
14
-8.014079
3.2.2 Weekly pay penalty in occupations across all sectors
Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:46
Weekly pay penalty for unit occupations across all sectors
unit_occupation_labelled
wtd_avg_income_not_outsourced
wtd_avg_income_outsourced
n_not_outsourced
n_outsourced
pay_penalty
Protective Service Occupations
798.1466
612.0307
87
11
-186.11591
Administrative Occupations: Government And Related Organisations
660.8699
487.6173
150
11
-173.25254
Information Technology Technicians
748.9725
576.4626
90
27
-172.50995
Elementary Administration Occupations
477.9704
352.3927
34
11
-125.57767
Functional Managers And Directors
820.2879
731.2395
385
88
-89.04844
Business, Research And Administrative Professionals
845.0837
756.2546
101
18
-88.82911
Teaching Professionals
675.1625
592.7411
293
41
-82.42140
Nursing Professionals
673.2751
602.6935
180
36
-70.58157
Sales, Marketing And Related Associate Professionals
684.0741
614.8720
155
20
-69.20211
Business Associate Professionals
685.3387
619.4062
125
21
-65.93246
Finance Professionals
805.6680
741.1792
110
20
-64.48880
Information Technology Professionals
887.5669
825.1680
231
50
-62.39887
Finance Associate Professionals
726.6762
672.2575
55
12
-54.41873
Teaching And Childcare Support Occupations
348.2522
294.9765
156
28
-53.27575
Shopkeepers And Sales Supervisors
479.0732
432.7521
87
27
-46.32116
Science, Engineering And Production Technicians
644.4197
604.9067
76
11
-39.51294
Secretarial And Related Occupations
477.0919
437.8735
146
17
-39.21838
Welfare And Housing Associate Professionals
525.0780
486.4501
84
10
-38.62785
Other Elementary Services Occupations
307.3713
269.3387
144
39
-38.03264
Customer Service Occupations
493.9996
458.6820
163
29
-35.31753
Other Health Professionals
671.4596
638.7357
62
17
-32.72395
Road Transport Drivers
569.7437
542.9774
154
41
-26.76629
Caring Personal Services
422.0606
396.9387
332
87
-25.12189
Elementary Cleaning Occupations
282.8777
264.0183
113
60
-18.85941
3.3 London has a disproportionate share of the UK’s outsourced workers, followed by the East and West Midlands
#regions
In London, around 25% of workers are outsourced – the highest proportion of any region in the UK. London is followed by the East Midlands (19%) and West Midlands (18%) in the share of workers in the region who are outsourced, with the East of England being the region with the lowest share of outsourced workers as part of the total employed workforce, at 13%.
Possible addition: Should this include some comment on WHY we think this might be the case? Should we look at sectoral splits in London, compared to everywhere else, to see whether there are significant sector differences that might explain this trend?
The plot below shows the proportion of workers within each region who are outsourced.47
Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (25%).
The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:
East Midlands (19%)
West Midlands (18%)
Wales (18%)
North West (17%)
Northern Ireland (16%)
We can also explore how the the entire UK workforce is distributed across the country.48 The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK’s outsourced workforce is concentrated. The regions with the highest share of the UK’s outsourced workforce are:
---title: "Key findings - matched to report"author: - Jolyon Miles-Wilson - Celestin Okorojidate: "`r format(Sys.time(), '%e %B %Y')`"format: html: self-contained: true code-fold: true code-tools: true code-summary: "Code for Nerds" toc: true toc-depth: 5execute: echo: false warning: falsenumber-sections: true---```{r packages}library(haven)library(poLCA)library(Hmisc)library(dplyr)library(ggplot2)library(tidyr)library(skimr)library(kableExtra)#library(MASS)library(wesanderson)library(ggrepel)library(here)library(emmeans)#library(devtools)#install_version("sjstats", version = "0.18.2")library(sjstats)library(readr)library(sjPlot)library(nnet)``````{r palette}rm(list = ls())options(scipen = 999)colours <- wes_palette("GrandBudapest2",4,"discrete")better_colours <- c('#8dd3c7','#bebada','#fb8072','#80b1d3','#fdb462')many_colours <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928','#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3','#fdb462','#b3de69','#fccde5','#d9d9d9','#bc80bd','#ccebc5','#ffed6f')``````{r functions}extract_glm_coefs <- function(mod, only_sig=F, decimal_places = 3){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") %>% # specify new variable to add rownames to mutate( or = round(exp(Estimate), decimal_places), .after=Estimate )}extract_lm_coefs <- function(mod, only_sig = F){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") # specify new variable to add rownames to }``````{r data, output=FALSE}#data <- haven::read_sav("../Data/2024-04-25 - Cleaned_Data.sav")data <- readRDS("../Data/2024-09-30 - Cleaned_Data.rds") # Make our disaggregated ethnicity groups identical to the originla census groups# as per request in JRF - Outsourced workers - To dos data$Ethnicity_collapsed_disaggregated <- data$Ethnicity_labelleddata <- data %>% mutate( Ethnicity_collapsed = case_when( # Grouping White ethnicities Ethnicity %in% c(1) ~ "White British", # Just white british as reference Ethnicity %in% c(2, 5) ~ "White other", # Irish and White other grouped together # Grouping Asian ethnicities Ethnicity %in% c(10,11,12,13,14) ~ "Asian/Asian British", # Grouping Black ethnicities Ethnicity %in% c(15,16,17) ~ "Black/African/Caribbean/Black British", # Grouping Mixed ethnicities Ethnicity %in% c(6,7,8,9) ~ "Mixed/Multiple ethnic group", # Grouping Other ethnicities Ethnicity %in% c(18) ~ "Arab/British Arab", # Handling missing or ambiguous categories Ethnicity %in% c(3,4,19) ~ "Other ethnic group", #prefer not to say Ethnicity %in% c(20,21) ~ "Prefer not to say", # Default case for any unmatched entries TRUE ~ "Prefer not to say" ) )#make white the reference categorydata$Ethnicity_collapsed <- relevel(factor(data$Ethnicity_collapsed), ref = "White British")data <- data %>% mutate( Has_Degree = factor(Has_Degree, levels = c("No", "Yes", "Don't know")) )# make binary born uk varcategories <- as.vector(unique(data$BORNUK_labelled))non_categories <- categories[!(categories %in% "I was born in the UK")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( BORNUK_binary = forcats::fct_collapse(BORNUK_labelled, "Born in UK" = "I was born in the UK", "Not born in UK" = non_categories) ) # make binary ethnicity varethnicities <- as.vector(unique(data$Ethnicity_collapsed))non_white_ethnicities <- ethnicities[!(ethnicities %in% "White British")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( Ethnicity_binary = forcats::fct_collapse(Ethnicity_collapsed, "White British" = c("White British"), "Non-White British" = non_white_ethnicities) )income_data <- filter(data, income_drop_all==0)```# Chapter 2: How many outsourced workers are there in the UK?## How many UK workers are outsourced?::: {.callout-tip title="#how-many"}- Around 1 in 6 UK workers meet our definition of an outsourced worker- The 'outsourced sub-group' is the most dominant of the three sub-groups - meaning the total group is predominantly made up of people who self-identify as an outsourced worker and they say they are hired to do work that is long-term or ongoing. People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around 67% (check) of our total outsourced group, or nearly 7 in 10. This group makes up X of all UK workers.:::```{r sum-outsourced}total_outsourced <- data %>% group_by(outsourcing_status) %>% summarise( Sum = sum(NatRepemployees), n = n() ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion, N = sum(n) )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced.csv")# Create function to find nearest denominator to express as a fraction.f <- function(x) ifelse(abs(1/floor(1/x) - x) < abs(1/ceiling(1/x) - x),floor(1/x),ceiling(1/x))```**1 in `r f(total_outsourced$Proportion[which(total_outsourced$outsourcing_status=="Outsourced")])` (`r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_status=="Outsourced")], 0)`%) of UK workers are outsourced.**[^1][^1]: [outputs/data/total_outsourced.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced.csv)```{r sum-outsourcing-group}total_outsourced_group <- data %>% group_by(outsourcing_group) %>% summarise( Sum = sum(NatRepemployees), n = n(), ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion, N = sum(n) )readr::write_csv(total_outsourced_group, file="../outputs/data/total_outsourced_2.csv")```In terms of the the different possible types of outsourced groups[^2], the numbers are as follows:[^2]: [outputs/data/total_outsourced_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced_2.csv)1. Definitely outsourced: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="Outsourced")], 0)`%2. Likely agency: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="Likely agency")], 0)`%3. High indicators: `r round(total_outsourced_group$Percentage[which(total_outsourced_group$outsourcing_group=="High indicators")], 0)`%```{r}breakdown <- data %>%filter(outsourcing_status=="Outsourced") %>%group_by(outsourcing_group) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq/total),N =sum(n) )breakdown2 <- data %>%group_by(outsourcing_group) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq/total),N =sum(n) )```People included in this sub-group (either uniquely, or while also meeting the criteria for at least one of the other sub-groups) make up around `r round(breakdown[which(breakdown$outsourcing_group=="Outsourced"),"percentage"],0)`% of our total outsourced group. This group makes up `r round(breakdown2[which(breakdown2$outsourcing_group=="Outsourced"),"percentage"],0)`% of all UK workers.:::{.callout-tip title='#non-exclusive-subgroups1'}- The two other sub-groups – the agency and indicators sub-groups – are less dominant in comparison. Around 58% of all respondents meet the criteria for either or both of these sub-groups, but this falls to around 33% if we exclude people who are already captured in the outsourced sub-group. Excluding the first sub-group, these other two groups makes up X of all UK workers.**The percentages here refer to the number of people who are outsourced (super-ordinate group), not the total number of respondents.** Below I provide percentages as function of the outsourced super-ordinate group as well as the total sample:::Group criteria- **Outsourced**, defined as responding 'I am sure I am outsourced' or 'I might be outsourced', and responding 'I do work on a long-term basis'.- **Likely agency**, defined as those responding 'I am sure I am agency' and 'I do work on a long-term basis', **excluding** those people who are already defined as being outsourced.- **High indicators**: defined as responding TRUE to 5 or 6 of the outsourcing indicators, as well as responding 'I do work on a long-term basis', **excluding** those people who are already defined as outsourced or likely agency.```{r}# non mutually exclusivegroups_non_excl <- data %>%mutate(# SURE outsourced or MIGHT BE outsourced + LONGTERMoutsourced =ifelse((Q3v3a ==1& Q2 ==1) | (Q3v3a ==2& Q2 ==1), 1, 0),# NOT outsourced, SURE agency, and LONG-TERMlikely_agency =ifelse(Q2 ==1& (Q3v3b ==1| Q3v3c ==1| Q3v3d ==1), 1, 0),likely_agency =ifelse(is.na(likely_agency), 0, likely_agency),# NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERMhigh_indicators =ifelse((Q2 ==1& sum_true >=5), 1, 0) )either <- groups_non_excl %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_perc <- either %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) # perc or weighted perc? ) %>%pull()either_excl_outsourced <- groups_non_excl %>%filter(outsourced==0) %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_excl_perc <- either_excl_outsourced %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) ) %>%pull()either %>%kable(caption ="Including outsourced group") %>%kable_styling(full_width = F)either_excl_outsourced %>%kable(caption ="Exluding outsourced group") %>%kable_styling(full_width = F)````r either_perc`% of the whole sample meet the criteria for either or both of these sub-groups. This falls to `r either_excl_perc`% if we exclude people who are already captured in the outsourced sub-group.```{r}# same as above but now only among those who are outsourcedgroups_non_excl <- data %>%filter(outsourcing_status=="Outsourced") %>%mutate(# SURE outsourced or MIGHT BE outsourced + LONGTERMoutsourced =ifelse((Q3v3a ==1& Q2 ==1) | (Q3v3a ==2& Q2 ==1), 1, 0),# NOT outsourced, SURE agency, and LONG-TERMlikely_agency =ifelse(Q2 ==1& (Q3v3b ==1| Q3v3c ==1| Q3v3d ==1), 1, 0),likely_agency =ifelse(is.na(likely_agency), 0, likely_agency),# NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERMhigh_indicators =ifelse((Q2 ==1& sum_true >=5), 1, 0) )either <- groups_non_excl %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_perc <- either %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) ) %>%pull()either_excl_outsourced <- groups_non_excl %>%filter(outsourced==0) %>%mutate(agency_or_indicator =case_when((likely_agency ==1& high_indicators ==0) ~"agency", (likely_agency ==0& high_indicators ==1) ~"indicator", (likely_agency ==1& high_indicators ==1) ~"both", (likely_agency ==0& high_indicators ==0) ~"neither",TRUE~NA) ) %>%group_by(agency_or_indicator) %>%summarise(freq =sum(NatRepemployees),n =n () ) %>%mutate(total =sum(freq),perc =100* (freq/total),N =sum(n) )either_excl_perc <- either_excl_outsourced %>%filter(agency_or_indicator !="neither") %>%summarise(round(sum(perc),2) ) %>%pull()n_outsourced <- total_outsourced[which(total_outsourced$outsourcing_status=="Outsourced"), "n"] %>%pull()either_incl_perc <- either %>%filter(agency_or_indicator !="neither") %>%summarise(round(100* (sum(n) / n_outsourced),2) ) %>%pull()either_excl_perc <- either_excl_outsourced %>%filter(agency_or_indicator !="neither") %>%summarise(round(100* (sum(n) / n_outsourced),2) ) %>%pull()```Out of those who are in the 'outsourced' status (i.e., the combination of the three outsourced groups), `r either_incl_perc`% meet the criteria for either or both of these sub-groups, but this falls to around `r either_excl_perc`% if we exclude people who are already captured in the outsourced sub-group.:::{.callout-tip title="#non-exclusive-subgroups2"}- There is some overlap between these sub-groups, but they are not like for like. Just over a quarter (27%) of respondents are in more than one sub-group, while nearly three quarters (73%) of respondents are uniquely captured in just one of the three sub-groups.:::```{r}groups_count <- data %>%filter(outsourcing_status=="Outsourced") %>%mutate(# SURE outsourced or MIGHT BE outsourced + LONGTERMoutsourced =ifelse((Q3v3a ==1& Q2 ==1) | (Q3v3a ==2& Q2 ==1), 1, 0),# NOT outsourced, SURE agency, and LONG-TERMlikely_agency =ifelse(Q2 ==1& (Q3v3b ==1| Q3v3c ==1| Q3v3d ==1), 1, 0),likely_agency =ifelse(is.na(likely_agency), 0, likely_agency),# NOT outsourced, NOT likely agency, 5 or more indicators, & LONGTERMhigh_indicators =ifelse((Q2 ==1& sum_true >=5), 1, 0),number_of_groups =rowSums(across(c(outsourced,likely_agency,high_indicators))) ) %>%group_by(number_of_groups) %>%summarise(total =sum(NatRepemployees),n =n() ) %>%mutate(wtd_percentage =100* (n/sum(n)),percentage =100* (total /sum(total)) )write_csv(groups_count, file="../outputs/data/number_of_groups.csv")```Just over a quarter (`r round(groups_count[which(groups_count$number_of_groups==2),"percentage"] + groups_count[which(groups_count$number_of_groups==3),"percentage"],2)`%) of respondents are in more than one sub-group, while nearly three quarters (`r round(groups_count[which(groups_count$number_of_groups==1),"percentage"],2)`%) of respondents are uniquely captured in just one of the three sub-groups.^[[outputs/data/number_of_groups.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/number_of_groups.csv)]## Evaluating our total estimate::: {.callout-important title="#evaluating-total-estimate To do"}- Around 1 in 4 "outsourced" respondents sit in more than one sub-group within our definition, but around 3 in 4 are uniquely captured in just one of the three sub-groups - predominantly in the outsourced sub-group.- As figure X shows, not all respondents in the outsourced sub-group said yes five or six of our six outsourcing:::# Chapter 3: Who are the UK’s outsourced workers?## Demographic breakdown {#sec-demographic-breakdown}Demographic variables:- Categorical - [x] Gender - [x] Ethnicity- Numeric - [x] Age - in age section: @sec-ageWe want them broken down by - outsourcing status - high low pay- outsourcing group - high low pay### Ethnicity by outsourcing status```{r}# pollster# crosstab(df = data, x = outsourcing_status, y = Ethnicity_collapsed, weight = NatRepemployees) %>%# kable()# # # base r# tab <- as.data.frame(xtabs(NatRepemployees ~ outsourcing_status + Ethnicity_collapsed, data=data))# test <- xtabs(NatRepemployees ~ outsourcing_status + income_group + Ethnicity_collapsed, data=data)# prop.table(test)# # percent_row <- 100 * prop.table(test, margin = 1)# test2 <- as.data.frame(percent_row)# # test2 %>%# filter(outsourcing_status=="Outsourced") %>%# summarise(sum(Freq))```#### Collapsed ethnicity^[[outputs/data/status_by_ethnicity.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity.csv)]```{r}tab <- data %>%group_by(outsourcing_status, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed )write_csv(tab, file="../outputs/data/status_by_ethnicity.csv")tab %>%pivot_wider(id_cols = outsourcing_status,names_from = Ethnicity_collapsed, values_from = Percentage ) %>%kable(caption ="Ethnicity by outsourcing status (%)",digits =2) %>%kable_styling(full_width = F)```#### Full ethnicity^[[outputs/data/status_by_ethnicity_full.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_full.csv)]```{r}tab <- data %>%group_by(outsourcing_status, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )write_csv(tab, file="../outputs/data/status_by_ethnicity_full.csv")tab %>%pivot_wider(id_cols = outsourcing_status,names_from = Ethnicity_labelled, values_from = Percentage ) %>%kable(caption ="Ethnicity by outsourcing status (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay##### Collapsed ethnicity^[[outputs/data/status_by_ethnicity_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed )write_csv(tab, file="../outputs/data/status_by_ethnicity_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Ethnicity_collapsed, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```##### Full ethnicity^[[outputs/data/status_by_ethnicity_full_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_ethnicity_full_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_labelled )write_csv(tab, file="../outputs/data/status_by_ethnicity_full_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Ethnicity_labelled, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```### Ethnicity by oustourcing group#### Collapsed ethnicity^[[outputs/data/group_by_ethnicity.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity.csv)]```{r}tab <- data %>%group_by(outsourcing_group, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/group_by_ethnicity.csv")tab %>%pivot_wider(id_cols = outsourcing_group,names_from = Ethnicity_collapsed, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing group (%)",digits =2) %>%kable_styling(full_width = F)```#### Full ethnicity^[[outputs/data/group_by_ethnicity_full.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_full.csv)]```{r}tab <- data %>%group_by(outsourcing_group, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_labelled ) write_csv(tab, file="../outputs/data/group_by_ethnicity_full.csv")tab %>%pivot_wider(id_cols = outsourcing_group,names_from = Ethnicity_labelled, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing group (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay##### Collapsed ethnicity^[[outputs/data/group_by_ethnicity_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group, Ethnicity_collapsed) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_collapsed ) write_csv(tab, file="../outputs/data/group_by_ethnicity_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_group,income_group),names_from = Ethnicity_collapsed, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```##### Full ethnicity^[[outputs/data/group_by_ethnicity_full_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_ethnicity_full_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_labelled ) write_csv(tab, file="../outputs/data/group_by_ethnicity_full_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_group,income_group),names_from = Ethnicity_labelled, values_from = Percentage )%>%kable(caption ="Ethnicity by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```### Gender by outsourcing status^[[outputs/data/status_by_gender.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_gender.csv)]```{r}tab <- data %>%group_by(outsourcing_status, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/status_by_gender.csv")tab %>%pivot_wider(id_cols = outsourcing_status,names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing status (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay^[[outputs/data/status_by_gender_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/status_by_gender_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/status_by_gender_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)```### Gender by outsourcing group^[[outputs/data/group_by_gender.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_gender.csv)]```{r}tab <- data %>%group_by(outsourcing_group, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/group_by_gender.csv")tab %>%pivot_wider(id_cols = outsourcing_group,names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing group (%)",digits =2) %>%kable_styling(full_width = F)```#### By high/low pay^[[outputs/data/group_by_gender_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/group_by_gender_income_group.csv)]```{r}tab <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group, Gender) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) write_csv(tab, file="../outputs/data/group_by_gender_income_group.csv")tab %>%pivot_wider(id_cols =c(outsourcing_group,income_group),names_from = Gender, values_from = Percentage )%>%kable(caption ="Gender by outsourcing group and income group(%)",digits =2) %>%kable_styling(full_width = F)```## Evidence paints a racialised picture of outsourcing in the UK, with links to both ethnicity and migration::: {.callout-tip title="#ethnicity"}- More than 1 in 4 (nearly 1/3) outsourced workers are from an ethnic minority background- Workers from ethnic minority backgrounds are disproportionately over-represented in outsourced work in the UK, and typically more likely to be outsourced than White British workers.- Overall, 22% of non-outsourced workers are from an ethnic minority background, rising to 33% of outsourced workers – a more than ten percentage point difference. This means that while just over 1 in 6 non-outsourced workers in our sample were from an ethnic minority background, nearly 1 in 3 outsourced workers were.- People from an ethnic minority background are overall 1.75 times more likely to be outsourced than people from a White British background.- Workers from Arab backgrounds are 3.86 times more likely than White workers to be outsourced; (check sample size – are we confident in all of these significance tests, or should we just use some of them in these bullet points?)- Workers from Black backgrounds are 2.33 times more likely than White workers to be outsourced.- Workers from Asian backgrounds are 1.98 times more likely than White workers to be outsourced- Workers from Mixed Ethnicity backgrounds are 1.86 times more likely than White workers to be outsourced- White other worksers are 1.30 times more likely than White British workers to be outsourced:::```{r ethnicity-counts}ethnicity_statistics <- data %>% group_by(outsourcing_status, Ethnicity_collapsed) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), Ethnicity_short = Ethnicity_collapsed ) %>% separate_wider_delim(Ethnicity_short, names = c("Ethnicity_short", "Ethnicity detail"), delim = stringr::regex(" / |, "), # use multiple delims too_few = "align_start", too_many = "merge")readr::write_csv(ethnicity_statistics, file = "../outputs/data/ethnicity_stats_1.csv")``````{r ethnicity_binary_inferential, output=FALSE}ethnicities <- as.vector(unique(data$Ethnicity_collapsed))non_white_ethnicities <- ethnicities[!(ethnicities %in% "White British")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( Ethnicity_binary = forcats::fct_collapse(Ethnicity_collapsed, "White British" = c("White British"), "Non-White British" = non_white_ethnicities) )mod <- glm(outsourcing_status ~ Ethnicity_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")# summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/ethnicity_binary_o-status_inferential_tab.csv")```People from an ethnic minority are `r round(coefs[2, 'or'],2)` times more likely to be outsourced than people from a White British background; `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White British"), "Percentage"],2)`% of outsourced workers are from an ethnic minority, compared to `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Not outsourced" & ethnicity_statistics$Ethnicity_collapsed == "White British"), "Percentage"],2)`% of non-outsourced workers.[^3][^3]: [outputs/data/ethnicity_stats_1.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_stats_1.csv) & [outputs/data/ethnicity_binary_o-status_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_binary_o-status_inferential_tab.csv)```{r ethnicity-plot}data %>% group_by(outsourcing_status, Ethnicity_binary) %>% summarise( n = n(), # count cases Frequency = sum(NatRepemployees) # count weighted cases ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) ) %>% ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) + geom_col(colour="black") + annotate("text", x = ethnicity_statistics$outsourcing_status, y = 99, label = paste0("N = ",ethnicity_statistics$N), hjust=1) + coord_flip() + scale_fill_manual(values = many_colours, name = "Ethnicity") + xlab("Outsourcing group") + theme_minimal()``````{r}#| output: false#| warning: false#| message: falsemod_2 <-glm(income_group ~ Ethnicity_collapsed * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)mod_3 <-glm(income_group ~ Ethnicity_binary * outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)summary(mod_3)black_coef <-extract_glm_coefs(mod_2,, only_sig = T)[5,"Estimate"] %>%exp(.) %>%round(.,2) %>%pull()other_coef <-extract_glm_coefs(mod_2,, only_sig = T)[6,"Estimate"] %>%exp(.) %>%round(.,2) %>%pull()```Overall, there is no interaction between being from a minority and outsourced on whether you are low paid. i.e., being from an ethnic minority and outsourced is not associated with being in the low pay group.^[[outputs/data/ethnicity_binary_income_group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/outputs/data/ethnicity_binary_income_group.csv)]However there is nuance in the groups. There is evidence to suggest that people who are Black and outsourced are less likely to be in the high income group (OR = `r black_coef`x). People who are from an 'other ethnic group and outsourced are more likely to be in the high income group (OR = `r other_coef`x!) (see tables in @sec-demographic-breakdown)```{r}tab_split <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Ethnicity_binary) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) tab_split %>%pivot_wider(id_cols =c(outsourcing_status,income_group),names_from = Ethnicity_binary, values_from = Percentage )%>%kable(caption ="Ethnicity (binary) by outsourcing status and income group(%)",digits =2) %>%kable_styling(full_width = F)write_csv(tab_split, "../outputs/data/ethnicity_binary_income_group.csv")tab_split %>%ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) +facet_grid(rows=vars(income_group)) +geom_col(colour="black") +coord_flip() +scale_fill_manual(values = many_colours, name ="Ethnicity") +xlab("") +theme_minimal()``````{r ethnicity-interential-status}mod <- glm(outsourcing_status ~ Ethnicity_collapsed, data, weights = NatRepemployees, family = "quasibinomial")# summary(mod)coef_table <- extract_glm_coefs(mod) %>% mutate(across(where(is.numeric), ~round(.x,2)))rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)# set rownames so we can indexrownames(sig_coefs) <- sig_coefs$variable# get labels for pipingethnicity_keys <- sig_coefs$variableethnicity_labs <- sub(".*collapsed","",ethnicity_keys)write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential.csv")```Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others[^4]:[^4]: [outputs/data/ethnicity_model_inferential.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential.csv)- `r ethnicity_labs[2]` workers are `r sig_coefs[ethnicity_keys[2], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[3]` workers are `r sig_coefs[ethnicity_keys[3], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[4]` workers are `r sig_coefs[ethnicity_keys[4], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[5]` workers are `r sig_coefs[ethnicity_keys[5], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[6]` workers are `r sig_coefs[ethnicity_keys[6], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[7]` workers are `r sig_coefs[ethnicity_keys[7], "or"]` times more likely than White British workers to be outsourced.```{r}mod <-glm(outsourcing_status ~ Ethnicity_collapsed_disaggregated, data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)coef_table <-extract_glm_coefs(mod) %>%mutate(across(where(is.numeric), ~round(.x,2)))rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod, only_sig = T) %>%mutate(across(where(is.numeric), ~round(.x,2)))rownames(sig_coefs) <- sig_coefs$variableethnicity_keys <- sig_coefs$variableethnicity_labs <-sub(".*disaggregated","",ethnicity_keys)write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential_2.csv")```Comparison of more disaggregated ethnicities indicates more nuance[^5]:[^5]: [outputs/data/ethnicity_model_inferential_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential_2.csv)- `r ethnicity_labs[2]` workers are `r sig_coefs[ethnicity_keys[2], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[3]` workers are `r sig_coefs[ethnicity_keys[3], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[4]` workers are `r sig_coefs[ethnicity_keys[4], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[5]` workers are `r sig_coefs[ethnicity_keys[5], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[6]` workers are `r sig_coefs[ethnicity_keys[6], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[7]` workers are `r sig_coefs[ethnicity_keys[7], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[8]` workers are `r sig_coefs[ethnicity_keys[8], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[9]` workers are `r sig_coefs[ethnicity_keys[9], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[10]` workers are `r sig_coefs[ethnicity_keys[10], "or"]` times more likely than White British workers to be outsourced.- `r ethnicity_labs[11]` workers are `r sig_coefs[ethnicity_keys[11], "or"]` times more likely than White British workers to be outsourced.```{r}#| include: falsecount <- data %>%group_by(Ethnicity_collapsed) %>%summarise(count =n(),freq =sum(NatRepemployees) )cis <-confint(mod, level=.95)```::: {.callout-tip title="#ethnicity-sub-group"}- These differences in ethnicity also shift slightly depending on which outsourced “sub-group” we look at. For example, compared to White British workers, Black outsourced workers are more likely to be in the “outsourced sub-group” meaning they have self-identified as outsourced, or the “agency sub-group”, meaning they are agency workers doing more long-term and ongoing work. **Are there any other interesting points to mention here? Should we do a chart showing this different across sub-groups? Do we need an interpretive comment in this section?**:::```{r ethnicity-group}mod <- multinom(outsourcing_group ~ Ethnicity_collapsed, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficients# get predicted group names to insert latergroup <- rownames(coefs)ors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs2 <- cbind(coefs, ors, p) %>% as_tibble() %>% mutate( predicted_group = group, .before=everything() # insert predicted group so output table can be better interpeted )write_csv(coefs2, file = "../outputs/data/ethnicity_ogroup_inferential_tab.csv")# sig_ors```Breaking down by outsourcing group helps to separate out the *type* of outsourced work people from the ethnicities identified above engage in.[^6] Compared to White British workers,[^6]: [outputs/data/ethnicity_ogroup_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab.csv)- Arab people are more likely to be likely agency or high indicators- Asian people are more likely to be in any of the groups- Black people are more likely to be likely agency or outsourced- People of mixed ethnicity are more likely to be outsourced- People who selected Other ethnicity are more likely to be agency- White other people are more likely to be outsourced```{r}sjPlot::plot_model(mod)``````{r}mod <-multinom(outsourcing_group ~ Ethnicity_collapsed_disaggregated, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <-summary(mod)$coefficients# get predicted group names to insert latergroup <-rownames(coefs)ors <-exp(coefs)colnames(ors) <-paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1-pnorm(abs(z), 0, 1)) *2colnames(p) <-paste(colnames(p), "p", sep="_")p_2 <-apply(p, 2, function(x) ifelse(x <0.01, 1, NA))sig_ors <-exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs2 <-cbind(coefs, ors, p) %>%as_tibble() %>%mutate(predicted_group = group, .before=everything() # insert predicted group so output table can be better interpeted )sig_ors2 <- sig_ors[,colSums(!is.na(sig_ors)) >0]sig_ors2 <-t(sig_ors2)# get teh sample informatoinsample_count <- data %>%group_by(outsourcing_group,Ethnicity_collapsed_disaggregated) %>%summarise(n =n(),freq =sum(NatRepemployees) ) %>%filter(outsourcing_group !="Not outsourced") %>%pivot_wider(names_from = outsourcing_group, values_from =c(n, freq))# combine sampel info with estimates# NAs in this table simply indicate non-sig results sig_ors2 <-as.data.frame(sig_ors2) %>% tibble::rownames_to_column(var ="Ethnicity_collapsed_disaggregated") %>%mutate(Ethnicity_collapsed_disaggregated =sub(".*disaggregated","", Ethnicity_collapsed_disaggregated) ) %>%filter(Ethnicity_collapsed_disaggregated !="(Intercept)") %>%left_join(sample_count, by ="Ethnicity_collapsed_disaggregated")write_csv(coefs2, file ="../outputs/data/ethnicity_ogroup_inferential_tab_2.csv")```More nuance from disaggregated ethnicities[^7]. The table below shows the likelihood of workers of different ethnicities falling into each of the outsourcing groups, compared to White British workers. Note that only significant relationships are shown here. *Note also that the 'n' for many of these statistics is very low. As such many of these statistics are illustrative but not inferential.*[^7]: [outputs/data/ethnicity_ogroup_inferential_tab_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab_2.csv)```{r}sig_ors2 %>%rename(Ethnicity = Ethnicity_collapsed_disaggregated ) %>%kable(caption ="Likelihood of belonging to different groups compared to White British. Note: NAs are non-sig. relationships. 'n_' is sample size, 'freq_' is weighted sample size", digits =2) %>%kable_styling(full_width = F)```::: {.callout-tip title="#ethnicity-pay-split"}- On the low-pay / high-pay split, you say “*A person is more likely to be in the low income group if they are: Older; Female; Prefer not to say when they arrived, And less likely if they are: Asian/Asian British; Live in North West or Wales; Arrived in the UK in last 30 years*”; Can I confirm this means we don’t see any other significant differences in the ethnicity breakdown if we look at high paid vs low paid workers? If so, let’s clarify what this says about how ethnicity relates to a) outsourced workers being disproportionately low paid, but b) ethnic minority workers being no more likely to be in our low pay group.*Using the new ethnicity groupings, there is no evidence indicating that any ethnicity is more or less likely to be in the low income group***Note to self: This could benefit from stepwise regression**:::```{r income-group}#| output: false#| message: false# test significance# mod <- glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)# summary(mod)# # test <- summary(mod)# # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])# p <- test[["coefficients"]][2,4]mod_2 <- glm(income_group ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)test <- summary(mod_2)or <- exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod_2, only_sig = T)write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv")``````{r}#| output: false#| message: falsemod_2 <-glm(income_group ~ Ethnicity_collapsed * outsourcing_status, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)mod_3 <-glm(income_group ~ Ethnicity_binary * outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)summary(mod_3)black_coef <-extract_glm_coefs(mod_2,, only_sig = T)[5,"Estimate"] %>%exp(.) %>%round(.,2) %>%pull()other_coef <-extract_glm_coefs(mod_2,, only_sig = T)[6,"Estimate"] %>%exp(.) %>%round(.,2) %>%pull()```A person is more likely to be in the low income group if they are:- Older- Female- Don't have a degree (or don't know if they have a degree?)- Are outsourced- Arrived in the UK in the last yearAnd less likely if they are:- Younger- Male- Have a degree- Live in the North West or Wales (compared to London)- Arrived in the UK in last 30 years::: {.callout-tip title="#migration"}- As you would expect, the vast majority of outsourced workers were born in the UK. However, we still see a significantly higher likelihood of outsourced workers having been born outside of the UK compared to people who aren’t outsourced. While around 14% of non-outsourced workers were born outside of the UK, this rose to just over 24% for outsourced workers – or nearly 1 in 4.- Overall, people who were born outside of the UK are 1.94 times more likely to be in outsourced work than people who were born here.:::```{r}data <- data %>%mutate(BORNUK_collapsed = forcats::fct_collapse(BORNUK_labelled,"Born in UK"="I was born in the UK","Came to UK recently"=c("Within the last year"),"Came to UK not recently"=c("Within the last 3 years","Within the last 5 years","Within the last 10 years","Within the last 15 years","Within the last 20 years","Within the last 30 years","More than 30 years ago"),"Prefer not to say"=c("Prefer not to say") ) )bornuk_statistics <- data %>%group_by(outsourcing_status, BORNUK_collapsed) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_collapsed_stats.csv")bornuk_statistics %>%ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_status)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_collapsed, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing status") +theme_minimal() +xlab("Arrival in UK") ``````{r bornuk_inferential, output=FALSE}mod <- glm(outsourcing_status ~ BORNUK_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/bornuk_ostatus_inferential_tab.csv")```As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Outsourced" & bornuk_statistics$BORNUK_collapsed == "Born in UK"), "Percentage"],2)`% of outsourced workers are not born in the UK, compared to `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Not outsourced" & bornuk_statistics$BORNUK_collapsed == "Born in UK"), "Percentage"],2)`% of non-outsourced workers.[^8] This difference is statistically significant; **outsourced workers are `r round(coefs %>% filter(variable == "BORNUK_binaryNot born in UK") %>% pull(or),2)` times more likely to have been born outside the UK than non-outsourced workers.**[^9][^8]: [outputs/data/arrival_in_UK_stats.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_stats.csv)[^9]: [outputs/data/bornuk_ostatus_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornuk_ostatus_inferential_tab.csv)::: {.callout-tip title="#migration-sub-groups"}- This pattern broadly holds across our three outsourcing sub-groups, with nearly no difference in the likelihood of people born outside of the UK being in any one of the three groups.:::```{r}mod <-multinom(outsourcing_group ~ BORNUK_binary, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <-summary(mod)$coefficientsors <-exp(coefs)colnames(ors) <-paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1-pnorm(abs(z), 0, 1)) *2colnames(p) <-paste(colnames(p), "p", sep="_")p_2 <-apply(p, 2, function(x) ifelse(x <0.01, 1, NA))sig_ors <-exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <-cbind(coefs, ors, p) %>%as_tibble()write_csv(coefs, file ="../outputs/data/bornuk_ogroup_inferential_tab.csv")# sig_orsbornuk_statistics_ogroup <- data %>%group_by(outsourcing_group, BORNUK_collapsed) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics_ogroup, file="../outputs/data/arrival_in_UK_collapsed_stats_ogroup.csv")bornuk_statistics_ogroup %>%ggplot(., aes(BORNUK_collapsed, Percentage, fill =outsourcing_group)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_collapsed, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing status") +theme_minimal() +xlab("Arrival in UK") ```::: {.callout-warning title="#ethnicity-migration-interaction. Some attention needed here"}Among all workers who were born in the UK:- Black workers are 2.01 times more likely to be outsourced than a White worker- Asian workers are 2.02 times more likely to be outsourced than a White worker.- Workers from Other ethnic backgrounds are X times more likely to be outsourced than a White other workerFor workers born outside of the UK:- Among White workers, someone not born in the UK is 1.82 times more likely to be outsourced than someone born in the UK.- Among workers from Mixed ethnic backgrounds, someone not born in the UK is 2.73 times more likely to be outsourced than someone born in the UK.- Among Other workers, someone not born in the UK is 0.13 times more likely to be outsourced than someone born in the UK.For workers from other ethnicities, it doesn’t matter whether you are born in the UK or not – you are equally likely as a Black or an Asian worker to be outsourced, whether you were born in the UK or somewhere else. And compared to a White person born in the UK, Black African and South Asian workers specifically are more likely to be outsourced, whether or not they were born in the UK . Does this need any further detail or explanation**To discuss confidence in our interpretation in this section: The evidence on ethnicity and country of birth clearly paints a racialised picture of outsourcing, and one with colonial undertones, as Black African and South Asian workers see a higher risk of being outsourced compared to White British workers, regardless of their country of birth. This obviously raises further questions about why, linked to (sector, occupation, labour market inequality and structural racism). Discuss the draft interpretation in the comment on the right.****However, workers from non-White ethnic groups are not the only workers who see a higher risk of being outsourced: Non-UK-born White workers are also more likely to be outsourced than UK-born White people . Ethnicity and country of birth interact independently for some groups, but seem to be fundamentally connected for others.**:::```{r}base_mod <- mod <-glm(outsourcing_status ~ Ethnicity_collapsed + BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")mod <-glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)# check that interaction imporves the model over main effects - it doesanova(base_mod, mod, test ="F")coefs <-extract_glm_coefs(mod)``````{r}ems <-emmeans(mod, specs ="Ethnicity_collapsed", by ="BORNUK_binary")cons <-summary(contrast(ems, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/ethnicity_bornUK_binary_contrasts.csv")```Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.[^10] The plot below shows that[^10]: [outputs/data/bornUK_binary_contrasts.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/bornUK_binary_contrasts.csv)- Among workers born in the UK, a Black worker is `r round(sig_cons %>% filter(contrast == "White British - (Black/African/Caribbean/Black British)") %>% pull(or),2)` times more likely to be outsourced than a White British worker.- Among workers born in the UK, an Asian worker is `r round(sig_cons %>% filter(contrast == "White British - (Asian/Asian British)") %>% pull(or),2)` times more likely to be outsourced than a White British worker.- Among workers born in the UK, an Other ethnicity worker is `r round(sig_cons %>% filter(contrast == "White British - Other ethnic group") %>% pull(or),2)` times more likely to be outsourced than a White other worker.- Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "White British - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a White British worker.- Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "(Black/African/Caribbean/Black British) - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a Black worker.- Among workers not born in the UK, a White other worker is `r round(sig_cons %>% filter(contrast == "(Mixed/Multiple ethnic group) - White other") %>% pull(or),2)` times as likely (i.e., less likely) to be outsourced than a worker of mixed ethnicity.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("BORNUK_binary","Ethnicity_collapsed"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()``````{r}ems_2 <-emmeans(mod, specs ="BORNUK_binary", by ="Ethnicity_collapsed")cons <-summary(contrast(ems_2, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/bornUK_binary_contrasts_2.csv")```Similarly, the plot below shows that[^11][^11]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)- Among White British workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "White British") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Mixed workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Mixed/Multiple ethnic group") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Other ethnicity workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Other ethnic group") %>% pull(or),2)` times as likely (i.e.,`r round(100 * (1 - (sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Other ethnic group") %>% pull(or))),0)`% less likely) to be outsourced than someone born in the UK.- Among people who preferred not to say their ethnicity, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Prefer not to say") %>% pull(or),2)` times as likely (i.e.,`r round(100 * (1 - (sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Prefer not to say") %>% pull(or))),0)`% less likely) to be outsourced than someone born in the UK.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","BORNUK_binary"), dodge=0.5) +coord_flip() +xlab("") +ylab("Likelihood of being outsourced") +theme_minimal()```::: {.callout-tip title="#migration-by-pay-split"}If we do a basic “born UK / not born UK” split, looking by low and high pay, what % of the low-paid workers group were born outside of the UK, vs in the high-paid group?:::```{r}#| message: falsemig_pay_split <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, BORNUK_binary) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq / total),N =sum(n) )low_pay_perc <- mig_pay_split %>%filter(income_group =="Low"& BORNUK_binary =="Not born in UK"& outsourcing_status =="Outsourced") %>%mutate(round(percentage,2) ) %>%pull()high_pay_perc <- mig_pay_split %>%filter(income_group =="Not low"& BORNUK_binary =="Not born in UK"& outsourcing_status =="Outsourced") %>%mutate(round(percentage,2) ) %>%pull()mod <-glm(income_group ~ BORNUK_binary, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)mod_2 <-glm(income_group ~ BORNUK_binary * outsourcing_status, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod_2)````r low_pay_perc`% of outsourced workers in the low pay group were not born in the UK, compared to `r high_pay_perc`% of people in the not low pay group. This difference is marginally statistically significant; someone in the low income group is less likely to be born outside the UK than someone in the not low income group. This pattern is the same for non outsourced workers, and when we consider the interaction between outsourcing status and migration status, the only factor predicting income group is outsourcing status.```{r}mig_pay_split %>%ggplot(aes(income_group, percentage, fill = BORNUK_binary)) +facet_grid(rows =vars(outsourcing_status)) +geom_col(position="dodge") +theme_minimal()```## Outsourced workers are on average younger than non-outsourced workers {#sec-age}::: {.callout-tip title="#age"}- We find that outsourced workers are significantly younger than non-outsourced workers, on average. The median age of an outsourced worker is 35, compared to a median age of 43 for a non-outsourced worker.- the outsourced and indicator sub-groups – people who directly said that they were or might be outsourced, or ticked a high number of our indicators of outsourced working – see higher proportions of younger workers than the “agency” sub-group.:::::: {.callout-important title="#age-violin"}INSERT VIOLIN PLOT CHART HERE SHOWING MEDIAN AGE OF EACH SUB-GROUP, COMPARED TO NON-OUTSOURCED WORKERS. **Is this necessary? We already have the density plots**:::```{r age-by-status}age_statistics <- data %>% group_by(outsourcing_status) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() )readr::write_csv(age_statistics, file = "../outputs/data/age_stats.csv")``````{r age-inferential, include=FALSE}test <- lm(Age ~ outsourcing_status, weights = NatRepemployees, data)summary(test)coefs <- extract_lm_coefs(test)readr::write_csv(coefs,file="../outputs/data/age_inferential.csv")```Outsourced workers are on average younger than non-outsourced workers. The median age of the outsourced group is `r age_statistics[which(age_statistics$outsourcing_status=="Outsourced"),"median"]` , compared to `r age_statistics[which(age_statistics$outsourcing_status=="Not outsourced"),"median"]` for the not outsourced group.[^12] This difference is statistically significant.[^13][^12]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)[^13]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)```{r age-by-status-plot}knitr::kable(age_statistics, digits = 2, col.names = c("Outsourcing group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F)data %>% mutate( Age = as.numeric(as.character(as_factor(Age))) ) %>% ggplot(.,aes(Age, colour = outsourcing_status, fill = outsourcing_status)) + geom_density(alpha = 0.3) + geom_vline(data =age_statistics, aes(xintercept=median, colour = outsourcing_status)) + scale_x_continuous(breaks = seq(min(age_statistics$min), max(age_statistics$max),5)) + theme_minimal() + scale_colour_manual(values=colours, name = "Outsourcing status") + scale_fill_manual(values=colours, name = "Outsourcing status")```The higher concentration of younger workers identified above appears to be driven primarily by the 'outsourced' and 'high indicator' groups, whilst the 'likely agency' group follows a similar pattern to the non-outsourced group.[^14][^14]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)```{r}age_statistics_income_group <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group) %>%summarise(mean =weighted.mean(Age, w = NatRepemployees, na.rm = T),median =wtd.quantile(Age, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(Age, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(Age, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)),N =n() )knitr::kable(age_statistics_income_group,digits =2,col.names =c("Outsourcing status","Income group","Mean","Median","Min","Max","Standard dev.","N")) %>%kable_styling(full_width = F)income_data %>%filter(!is.na(income_group)) %>%mutate(Age =as.numeric(as.character(as_factor(Age))) ) %>%ggplot(.,aes(Age, colour = outsourcing_status, fill = outsourcing_status)) +facet_grid(rows =vars(income_group)) +geom_density(alpha =0.3) +geom_vline(data = age_statistics_income_group, aes(xintercept=median, colour = outsourcing_status)) +scale_x_continuous(breaks =seq(min(age_statistics_income_group$min), max(age_statistics_income_group$max),5)) +theme_minimal() +scale_colour_manual(values=colours, name ="Outsourcing status") +scale_fill_manual(values=colours, name ="Outsourcing status")``````{r age-by-group}age_statistics_2 <- data %>% group_by(outsourcing_group) %>% summarise( mean = weighted.mean(Age, w = NatRepemployees, na.rm = T), median = wtd.quantile(Age, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(Age, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(Age, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)), N = n() )readr::write_csv(age_statistics_2, file = "../outputs/data/age_stats_2.csv")``````{r age-by-group-plot}knitr::kable(age_statistics_2, digits = 2, col.names = c("Outsourcing group", "Mean", "Median", "Min", "Max", "Standard dev.", "N")) %>% kable_styling(full_width = F)data %>% ggplot(.,aes(Age, colour = outsourcing_group, fill = outsourcing_group)) + geom_density(alpha = 0.2) + geom_vline(data = age_statistics_2, aes(xintercept=median, colour = outsourcing_group)) + scale_x_continuous(breaks = seq(min(age_statistics_2$min), max(age_statistics_2$max),5)) + theme_minimal() + scale_colour_manual(values=better_colours, name = "Outsourcing group") + scale_fill_manual(values=better_colours, name = "Outsourcing group")``````{r}age_statistics2_income_group <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_group, income_group) %>%summarise(mean =weighted.mean(Age, w = NatRepemployees, na.rm = T),median =wtd.quantile(Age, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(Age, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(Age, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)),N =n() )knitr::kable(age_statistics2_income_group,digits =2,col.names =c("Outsourcing group","Income group","Mean","Median","Min","Max","Standard dev.","N")) %>%kable_styling(full_width = F)income_data %>%filter(!is.na(income_group)) %>%mutate(Age =as.numeric(as.character(as_factor(Age))) ) %>%ggplot(.,aes(Age, colour = outsourcing_group, fill = outsourcing_group)) +facet_grid(rows =vars(income_group)) +geom_density(alpha =0.3) +geom_vline(data = age_statistics2_income_group, aes(xintercept=median, colour = outsourcing_group)) +scale_x_continuous(breaks =seq(min(age_statistics2_income_group$min), max(age_statistics2_income_group$max),5)) +theme_minimal() +scale_colour_manual(values=colours, name ="Outsourcing group") +scale_fill_manual(values=colours, name ="Outsourcing group")```::: {.callout-tip title="#gender"}- The evidence also finds meaningful differences by gender between the outsourced and non-outsourced groups in our data. Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce, a nearly 10 percentage point difference.- Outsourced workers are 1.44 times more likely to be male than female. - The group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Comparison of outsourced and non-outsourced workers finds that- Someone in the high indicators sub-group is 2.18 times more likely to be male than female.- Someone in the agency sub-group is 1.45 times more likely to be male than female.- Someone in the outsourced sub-group is 1.31 times more likely to be male than female.:::::: {.callout-important title="#gender-sector"}- Possible addition: Will readers want to know more about how this intersects with the roles or sectors with higher rates of outsourcing – even if this is just an interpretive comment from us on how gender interacts with jobs and sectors more generally in the labour market?:::```{r}gender_statistics <- data %>%group_by(outsourcing_status, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics, file="../outputs/data/gender_statistics.csv")``````{r gender-outsourcing-status}mod <- multinom(Gender ~ outsourcing_status, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)coefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/gender_inferential_tab.csv")```The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.[^15] Men make up `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the outsourced workforce compared to `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Not outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are `r round(sig_ors['Male', 'outsourcing_statusOutsourced'], 2)` times more likely to be male than female.[^16][^15]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)[^16]: [../outputs/data/gender_inferential_tab.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/gender_inferential_tab.csv)```{r}# gender_statistics %>%# kable() %>%# kable_styling(full_width = F)gender_statistics %>%ggplot(., aes(outsourcing_status, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics$outsourcing_status, y =99, label =paste0("N = ", gender_statistics$N), hjust=1) ``````{r}gender_statistics_2 <- data %>%group_by(outsourcing_group, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics_2, file="../outputs/data/gender_statistics_2.csv")``````{r gender-outsourcing-group}mod <- multinom(Gender ~ outsourcing_group, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/gender_inferential_tab_2.csv")```Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the 'high indicators' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="High indicators" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'likely agency' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Likely agency" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'outsourced' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Outsourced" & Gender == "Male") %>% pull(Percentage), 2)`%). Statistically speaking, compared to a not outsourced person,- Someone in the high indicators group is `r round(sig_ors['Male', 'outsourcing_groupHigh indicators'],2)` times more likely to be male than female.- Someone in the likely agency group is `r round(sig_ors['Male', 'outsourcing_groupLikely agency'],2)` times more likely tobe male than female.- Someone in the outsourced group is `r round(sig_ors['Male', 'outsourcing_groupOutsourced'],2)` times more likely tobe male than female.Additionally, people identifying as 'Other' gender are absent from the high indicators and likely agency groups, though given the small N (`r sum(data$Gender=="Other")`) for this group, this finding is unlikely to be meaningful.```{r}# gender_statistics_2 %>%# kable() %>%# kable_styling(full_width = F)gender_statistics_2 %>%ggplot(., aes(outsourcing_group, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics_2$outsourcing_group, y =99, label =paste0("N = ", gender_statistics_2$N), hjust=1) ```## Outsourced workers are more likely to work in some sectors than others; but seem to be spread across the labour market::: {.callout-tip title="#sectors"}- The three most common sectors for outsourced workers in our survey to be employed within – excluding those with an N size below X (50?) – were administrative and support service activities; water supply, sewerage, waste supply and remediation activities; and other service activities- Five of the twenty employment sectors have at least 1 in 5 of their workforce “outsourced”: more than the average of around 17% across the whole workforce.:::Here we explore what proportion of workers in each sector are outsourced.[^17][^17]: [outputs/data/sector_summary_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)```{r sector-summary-3}sector_summary_3 <- data %>% #filter(income_drop_all == 0) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual_all, na.rm=T), # wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_3.csv")```The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.```{r sector-plot-2}plot_data <- sector_summary_3 %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )# annotation_df <- plot_data %>%# select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>% filter(outsourcing_status == "Not outsourced") %>% select(SectorName_short, N) %>% mutate( ypos = 80 )ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_status)) + geom_col() + geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + coord_flip() + scale_fill_manual(values=many_colours) + scale_y_continuous(breaks=seq(0,100,10))# sector_key <- data.frame("number" = seq(1,length(unique(plot_data$SectorName_labelled)),1),# "Sector" = levels(plot_data$SectorName_labelled))# # sector_key %>%# kable() %>%# kable_styling(full_width = F)```The top three Sectors with the highest proportion of outsourced workers are:- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==3])` (note that N = 31)- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==4])`- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==22])`Note that for an undefined sector ('Not found') contained one of the largest proportions of outsourced workers (`r round(plot_data$perc[which(plot_data$SectorName==16 & plot_data$outsourcing_status=="Outsourced")],0)`% of workers in the 'Not found' category were outsourced).A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining... and Extraterritoral organisations... all the way to `r round(outsourced[which(outsourced$rank==1),'perc'],0)`% for `r outsourced[which(outsourced$rank==1),'SectorName_short']`, with 5 out 20 sectors having at least 20% of their workforce outsourced.:::{.callout-tip title=#sectors-ogroup}- Figure X also shows how the total outsourced group in each sector splits into our three outsourced “sub-groups”. We find – as you might expect, based on its dominance within the group of outsourced workers – that outsourced workers in every sector are most likely to be in the “outsourced sub-group”, i.e. those who self-identified as outsourced workers.:::```{r}sector_summary_3 <- data %>%#filter(income_drop_all == 0) %>%filter(outsourcing_group!="Not outsourced") %>%group_by(SectorName, SectorName_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual_all, na.rm=T),# wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )plot_data <- sector_summary_3 %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortoutsourced_levels <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(outsourced_levels$SectorName_short)), ) # annotation_df <- plot_data %>%# select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>%filter(outsourcing_group =="Outsourced") %>%select(SectorName_short, N) %>%mutate(ypos =80 )plot_data <- plot_data %>%filter(outsourcing_group!="Not outsourced")ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_group)) +geom_col() +geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +coord_flip() +scale_fill_manual(values=many_colours) +scale_y_continuous(breaks=seq(0,100,10))```# Pay::: {.callout-tip title="'#pay"}- Using regression analysis, we find that outsourced workers are on average paid £2170 less than non-outsourced workers .- The “outsourced sub-group” earns £3,813 less, and the “agency sub-group” £2,603 less, than the non-outsourced group. This finds that pay is lowest in the “outsourced sub-group” of workers, i.e. those who directly identified themselves as being outsourced. Figure X below shows the median and distribution of pay across the three outsourced sub-groups and the non-outsourced group, for comparison.:::::: {.callout-important title="#pay-violin"}Violin plot for the above:::```{r income}# filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%.income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-status.csv")mod <- lm(income_annual_all ~ outsourcing_status, income_data, weights = NatRepemployees)# summary(mod)coef_table <- extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_lm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-status.csv")income_statistics_weekly <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics_weekly, file="../outputs/data/weekly_income_stats_o-status.csv")mod_weekly <- lm(income_weekly_all ~ outsourcing_status, income_data, weights = NatRepemployees)# summary(mod)coef_table_weekly <- extract_lm_coefs(mod_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs_weekly <- extract_lm_coefs(mod_weekly, only_sig = T)write_csv(coef_table_weekly, file="../outputs/data/model_income_by_o-status_weekly.csv")```The tables and plots below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` less annually than non-outsourced workers**.[^18] Per week, **outsourced workers are on average paid £`r abs(round(coef_table_weekly['outsourcing_statusOutsourced','Estimate'],0))` less than non-outsourced workers**[^18]: [outputs/data/income_stats_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-status.csv) & [outputs/data/model_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status.csv)Weekly stats here^[[outputs/data/weekly_income_stats_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/weekly_income_stats_o-status.csv) & [outputs/data/model_income_by_o-status_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status_weekly.csv)]```{r income-plot}knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing status", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_status, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", income_statistics$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))# weeklyknitr::kable(income_statistics_weekly, digits = 2, col.names = c("Outsourcing status", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% ggplot(., aes(outsourcing_status, income_weekly_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics_weekly, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics_weekly$mean,0),"\n", "Median = ", income_statistics_weekly$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Weekly income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics_weekly$min), 10, f = floor),plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics_weekly$min), 10, f = ceiling), plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling), 100))``````{r income-outsourcing-group}# filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%.income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_group) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-group.csv")mod <- lm(income_annual_all ~ outsourcing_group, income_data, weights = NatRepemployees)# summary(mod)coef_table <- extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_lm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-group.csv")income_statistics_weekly <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(outsourcing_group) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics_weekly, file="../outputs/data/weekly_income_stats_o-group.csv")mod_weekly <- lm(income_weekly_all ~ outsourcing_group, income_data, weights = NatRepemployees)# summary(mod)coef_table_weekly <- extract_lm_coefs(mod_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs_weekly <- extract_lm_coefs(mod_weekly, only_sig = T)write_csv(coef_table_weekly, file="../outputs/data/model_income_by_o-group_weekly.csv")```The tables and plots below show descriptive statistics on income and its distribution for outsrouced groups. Only the full outsourced subgroup has lower income than non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_groupOutsourced','Estimate'],0))` less annually than non-outsourced workers**.[^18] Per week, **outsourced workers are on average paid £`r abs(round(coef_table_weekly['outsourcing_groupOutsourced','Estimate'],0))` less than non-outsourced workers**[^18]: [outputs/data/income_stats_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-group.csv) & [outputs/data/model_income_by_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group.csv)Weekly stats here^[[outputs/data/weekly_income_stats_o-group.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/weekly_income_stats_o-group.csv) & [outputs/data/model_income_by_o-group_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group_weekly.csv)]```{r income-plot-group}knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_group, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_group, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", round(income_statistics$median,0)), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing group") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))# weeklyknitr::kable(income_statistics_weekly, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% ggplot(., aes(outsourcing_group, income_weekly_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics_weekly, aes(outsourcing_group, y = 1300), label=paste0("Mean = ", round(income_statistics_weekly$mean,0),"\n", "Median = ", round(income_statistics_weekly$median,0)), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing group") + ylab("Weekly income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics_weekly$min), 10, f = floor),plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics_weekly$min), 10, f = ceiling), plyr::round_any(max(income_statistics_weekly$max),10, f = ceiling), 100))``````{r}#| output: falsemod <-lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees)summary(mod)mod_2 <-lm(income_annual_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees)summary(mod_2)mod_3 <-update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)# anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_3, only_sig = T)write_csv(coef_table, file="../outputs/data/model_2_income_by_o-status.csv")mod_3_weekly <-lm(income_weekly_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3_weekly)coef_table_weekly <-extract_lm_coefs(mod_3_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs <-extract_glm_coefs(mod_3_weekly, only_sig = T)write_csv(coef_table_weekly, file="../outputs/data/model_2_income_by_o-status_weekly.csv")```This difference increases to £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` annually (£`r abs(round(coef_table_weekly['outsourcing_statusOutsourced','Estimate'],0))` per week) when we take into account Age, Gender, Education, Ethnicity, Region, and Arrival Time. [^19] This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average, and controlling for each of the other variables in the model. Annually:[^19]: [outputs/data/model_2_income_by_o-status.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status.csv)- Men earn £`r abs(round(coef_table['GenderMale','Estimate'],0))` more than women.- People who have a degree earn £`r abs(round(coef_table['Has_DegreeYes','Estimate'],0))` more than people without a degree.- Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table['RegionYorkshire and the Humber','Estimate'],0))`- People who arrived in the UK within the last year earn £`r abs(round(coef_table['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 3 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 3 years','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 5 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 5 years','Estimate'],0))` less than people born in the UK- People who arrived within the last 30 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK.Weekly^[[outputs/data/model_2_income_by_o-status_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status_weekly.csv)]:- Men earn £`r abs(round(coef_table_weekly['GenderMale','Estimate'],0))` more than women.- People who have a degree earn £`r abs(round(coef_table_weekly['Has_DegreeYes','Estimate'],0))` more than people without a degree.- Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table_weekly['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table_weekly['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table_weekly['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table_weekly['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table_weekly['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table_weekly['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table_weekly['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table_weekly['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table_weekly['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table_weekly['RegionYorkshire and the Humber','Estimate'],0))`- People who arrived in the UK within the last year earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 3 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 3 years','Estimate'],0))` less than people born in the UK- People who arrived in the UK within the last 5 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 5 years','Estimate'],0))` less than people born in the UK- People who arrived within the last 30 years earn £`r abs(round(coef_table_weekly['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK.## Gender pay gap```{r}#| output: false#| messages: false#| warnings: falsesimp_mod <-lm(income_annual_all ~ outsourcing_status*Gender, income_data, weights=NatRepemployees)summary(simp_mod)mod <-lm(income_annual_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled + Gender:outsourcing_status, income_data, weights = NatRepemployees)summary(mod)mod_weekly <-lm(income_weekly_all ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled + Gender:outsourcing_status, income_data, weights = NatRepemployees)summary(mod_weekly)simp_mod_weekly <-lm(income_weekly_all ~ outsourcing_status*Gender, income_data, weights=NatRepemployees)summary(simp_mod)```::: {.callout-warning title="#gender-pay-gap"}- On average within our sample, male workers earn £6400 more than female workers per year; but further exploration of how pay relates to gender for outsourced workers suggests that this gender pay gap doesn’t differ in a statistically significant way depending on whether workers are outsourced or not- For female outsourced workers, this suggests that being an outsourced worker neither exacerbates nor diminishes the gender pay gap they face compared to male workers. **Check what this controls for**:::### Outsourcing status```{r gender-pay-gap-1}gender_outsourced_gap <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )not_outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Not outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)gender_outsourced_gap %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap %>% ggplot(aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + ggtitle("Annual income")write_csv(gender_outsourced_gap, "../outputs/data/o-status_gender_gap.csv")# weeklygender_outsourced_gap_weekly <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )not_outsourced_gap_weekly <- gender_outsourced_gap_weekly %>% filter(outsourcing_status == "Not outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)outsourced_gap_weekly <- gender_outsourced_gap_weekly %>% filter(outsourcing_status == "Outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)gender_outsourced_gap_weekly %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap_weekly%>% ggplot(aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge")+ ggtitle("Weekly income")write_csv(gender_outsourced_gap_weekly, "../outputs/data/o-status_gender_gap_weekly.csv")```**Annual**^[[outputs/data/o-status_gender_gap.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-status_gender_gap.csv) & [outputs/data/mod_o-status_gender.csv](outputs/data/mod_o-status_gender.csv)]:Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r round(not_outsourced_gap,2)` less than males. For outsourced workers, females are paid £`r round(outsourced_gap,2)` less than males. The difference between non-outsourced and outsourced workers is not significant.**Weekly**^[[outputs/data/o-status_gender_gap_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-status_gender_gap_weekly.csv) & [outputs/data/mod_o-status_gender_weekly.csv](outputs/data/mod_o-status_gender_weekly.csv)]: Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r round(not_outsourced_gap_weekly,2)` less than males. For outsourced workers, females are paid £`r round(outsourced_gap_weekly,2)` less than males. The difference between non-outsourced and outsourced workers is not significant.```{r gender-outsourcing-int}#| output: falseggplot(gender_outsourced_gap, aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + geom_label(aes(label=round(median,0)), position=position_dodge(width=0.9)) + theme_minimal() + ylab("Median income") + xlab("Outsourcing status")simp_mod <- lm(income_annual_all ~ Gender*outsourcing_status, income_data, weights = NatRepemployees)summary(simp_mod)# simp_mod2 <- update(simp_mod, ~. + Has_Degree)# summary(simp_mod2)# anova(simp_mod, simp_mod2)mod_2 <- lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status, income_data, weights = NatRepemployees)summary(mod_2)mod_3 <- update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <- extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_o-status_gender.csv")mod_3_weekly <- lm(income_weekly_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3_weekly)coef_table_weekly <- extract_lm_coefs(mod_3_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs <- extract_glm_coefs(mod_3_weekly, only_sig = T)write_csv(coef_table_weekly, "../outputs/data/mod_o-status_gender_weekly.csv")```The gender by outsourcing status is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).```{r}#| output: falsemod <-glm(income_group ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod)# test <- summary(mod)# # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])# p <- test[["coefficients"]][2,4]# # coef_table <- extract_lm_coefs(mod_3)# rownames(coef_table) <- coef_table$variable# sig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv")```### Outsourcing group```{r gender-pay-gap-group}#| output: false#| warnings: false#| messages: falsegender_outsourced_gap <- income_data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )gender_outsourced_gap %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap %>% ggplot(aes(outsourcing_group, median, fill = Gender)) + geom_col(position="dodge") + ggtitle("Annual income")write_csv(gender_outsourced_gap, "../outputs/data/o-group_gender_gap.csv")# weeklygender_outsourced_gap_weekly <- income_data %>% group_by(outsourcing_group, Gender) %>% summarise( n = n(), mean = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_weekly_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_weekly_all, w = NatRepemployees, na.rm = T)) )gender_outsourced_gap_weekly %>% kable() %>% kable_styling(full_width = F)gender_outsourced_gap_weekly%>% ggplot(aes(outsourcing_group, median, fill = Gender)) + geom_col(position="dodge")+ ggtitle("Weekly income")write_csv(gender_outsourced_gap_weekly, "../outputs/data/o-group_gender_gap_weekly.csv")# models## annualmod_3 <- lm(income_annual_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3)coef_table <- extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_o-group_gender.csv")# weeklymod_3_weekly <- lm(income_weekly_all ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, weights = NatRepemployees)summary(mod_3_weekly)coef_table_weekly <- extract_lm_coefs(mod_3_weekly)rownames(coef_table_weekly) <- coef_table_weekly$variablesig_coefs_weekly <- extract_glm_coefs(mod_3_weekly, only_sig = T)write_csv(coef_table_weekly, "../outputs/data/mod_o-group_gender_weekly.csv")```**Annual data files**^[[outputs/data/o-group_gender_gap.csv.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-group_gender_gap.csv.csv) & [outputs/data/mod_o-group_gender.csv](outputs/data/mod_o-group_gender.csv)]:**Weekly**^[[outputs/data/o-group_gender_gap_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/o-group_gender_gap_weekly.csv) & [outputs/data/mod_o-group_gender_weekly.csv](outputs/data/mod_o-group_gender_weekly.csv)]: The gender by outsourcing group is also not relevant for whether a worker is low income (i.e. non-sig relationship with income_group).```{r}#| output: falsemod <-glm(income_group ~ Age + Has_Degree + Ethnicity_collapsed + Region + Gender*outsourcing_group + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod)# test <- summary(mod)# # or <- exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])# p <- test[["coefficients"]][2,4]# # coef_table <- extract_lm_coefs(mod_3)# rownames(coef_table) <- coef_table$variable# sig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv")```::: {.callout-tip title="#gender-income-group"}- In particular, people are more likely to be in our low-paid outsourced group if they are female, or older workers .:::Income group[^21][^21]: [../outputs/data/income_group_outsourcing.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/income_group_outsourcing.csv)```{r}#| output: false# test significancemod <-glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)summary(mod)test <-summary(mod)or <-exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]mod_2 <-glm(income_group ~ Age + Gender + Has_Degree + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)# test <- summary(mod_2)or <-exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]coef_table <-extract_glm_coefs(mod_2)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_2, only_sig = T)write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv")```A person is more likely to be in the low income group if they are:- Older- Female- Don't have a degree (or don't know if they have a degree?)- Are outsourced- Arrived in the UK in the last yearAnd less likely if they are:- Younger- Male- Have a degree- Live in the North West or Wales (compared to London)- Arrived in the UK in last 30 years::: {.callout-tip title="#gender-by-pay-split"}Is there already a basic low / high pay split for gender? I know you talk about women being more likely to be in the low-paid group, but again not sure if there is just a basic “women make up x% of low pay group and x% of not low pay group”?:::```{r}#| message: falsegender_pay_split <- income_data %>%filter(!is.na(income_group)) %>%group_by(outsourcing_status, income_group, Gender) %>%summarise(freq =sum(NatRepemployees),n =n() ) %>%mutate(total =sum(freq),percentage =100* (freq / total),N =sum(n) )low_pay_perc <- gender_pay_split %>%filter(income_group =="Low"& outsourcing_status =="Outsourced"& Gender =="Female") %>%mutate(round(percentage,2) ) %>%pull()high_pay_perc <- gender_pay_split %>%filter(income_group =="Not low"& outsourcing_status =="Outsourced"& Gender =="Female") %>%mutate(round(percentage,2) ) %>%pull()mod <-glm(income_group ~ Gender, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)mod_2 <-glm(income_group ~ Gender * outsourcing_status, income_data, weights = NatRepemployees, family ="quasibinomial")# summary(mod_2)````r low_pay_perc`% of outsourced workers in the low pay group were female, compared to `r high_pay_perc`% of outsourced workers in the not low pay group. This difference is statistically significant; women are more likely to be in the low income group. This pattern is the same for non outsourced workers, and there is no interaction effect; irrespective of outsourcing status, women are more likely to be low paid, and irrespective of gender, outsourced people are more likely to be low paid.```{r}gender_pay_split %>%ggplot(aes(income_group, percentage, fill = Gender)) +facet_grid(rows =vars(outsourcing_status)) +geom_col(position="dodge") +theme_minimal()```::: {.callout-important title="#pay-gap-sector"}- Overall, we find that workers in administrative and support service activities – one of the dominant sectors for outsourced workers in this research – are more likely to be lower-paid than non-outsourced workers in the same sector. The same is true for outsourced water supply (full name; sewerage, waste etc.) workers – another prominent outsourcing sector – information and communication, transportation and storage, and education workers, amongst others. In contrast, we find outsourced workers in financial and insurance activities, for example, appear to be slightly higher paid on average than their non-outsourced counterparts; however, this is one of the few sectors in which this appears to be the case.**to be confirmed**I don’t quite understand the chart below the above chart in the file, would you be able to explain it – thanks! Is this the best chart to use, above? Does this need to control for anything else to show us the most accurate analysis of pay by sector for outsourced and non outsourced, or are we confident that this is showing us something notable about sector and pay?:::## Sectoral pay differences^[[outputs/data/sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_pay_weekly.csv)]```{r sector-bubble}sector_summary_pay <- data %>% filter(income_drop_all == 0 & !is.na(income_weekly_all)) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_weekly_all, na.rm=T), wtd_avg_income = weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_pay, file="../outputs/data/sector_summary_pay_weekly.csv")plot_data <- sector_summary_pay %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, N, .desc = FALSE))# outsourced <- plot_data %>%# filter(outsourcing_status == 'Outsourced') %>%# mutate(# rank = rank(desc(perc))# )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), ) %>% arrange(desc(SectorName_short))annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% select(SectorName_short, n) %>% group_by(SectorName_short) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 100)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + geom_label_repel(inherit.aes = F, aes(wtd_avg_income, SectorName_short, colour = outsourcing_status, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average weekly income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce within the sector")sectors_of_interest <- unique(plot_data$SectorName_labelled)sectors_of_interest <- sectors_of_interest[1:13] %>% stringr::str_to_title()```#### Occupations^[[outputs/data/occupation_in_sector_summary_pay_weekly.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/occupation_in_sector_summary_pay_weekly.csv)]Here we look at Major subgroup occupations within sectors. We only consider the down to 'Other services', as the remaining sectors have small n for outsourced group. Note you can find larger images for these plots in [outputs/figures/occupation_pay_plots](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/figures/occupation_pay_plots).The figures indicate there is variation between occupations within sectors in terms of whether outsourced people are paid less or more than non-outsourced workers.```{r}#| height: 10#| width: 10occ_in_sect_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_weekly_all)) %>%group_by(SectorName, SectorName_labelled, MajorsubgroupOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_weekly_all, na.rm=T),wtd_avg_income =weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName_labelled, MajorsubgroupOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),MajorsubgroupOccupation_labelled =case_when(MajorsubgroupOccupation_labelled =="NA"~NA,TRUE~ MajorsubgroupOccupation_labelled),MajorsubgroupOccupation_labelled = stringr::str_to_title(MajorsubgroupOccupation_labelled),SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) write_csv(occ_in_sect_summary_pay, file="../outputs/data/occupation_in_sector_summary_pay_weekly.csv")for(sector in sectors_of_interest){#print(sector)# subset to this sector and drop na occupatoins plot_data <- occ_in_sect_summary_pay %>%filter(SectorName_labelled == sector) %>%filter(!is.na(MajorsubgroupOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>%select(MajorsubgroupOccupation_labelled, outsourcing_status, N) %>%distinct(MajorsubgroupOccupation_labelled, N) %>%mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(MajorsubgroupOccupation_labelled = forcats::fct_reorder(MajorsubgroupOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original data plot_data <- plot_data %>%mutate(MajorsubgroupOccupation_labelled =factor(MajorsubgroupOccupation_labelled, levels =levels(not_outsourced_levels$MajorsubgroupOccupation_labelled)), ) annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>%select(MajorsubgroupOccupation_labelled, n) %>%group_by(MajorsubgroupOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) p <- plot_data %>%ggplot(., aes(wtd_avg_income, MajorsubgroupOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, MajorsubgroupOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=MajorsubgroupOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average weekly income") +ylab("Major subgroup occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle(sector)show(p)ggsave(here('outputs','figures','occupation_pay_plots',paste0('occupation_pay_plot_', sector, '.png')), height =8, width =8, dpi=800, bg="white")}```### Weekly pay penalty in occupations within sectors```{r}#| height: 10#| width: 10unit_occ_in_sect_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_weekly_all)) %>%group_by(SectorName, SectorName_labelled, UnitOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_weekly_all, na.rm=T),wtd_avg_income =weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName_labelled, UnitOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),UnitOccupation_labelled =case_when(UnitOccupation_labelled =="NA"~NA,TRUE~ UnitOccupation_labelled),UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled),SectorName_labelled = stringr::str_to_title(SectorName_labelled) ) %>%ungroup()write_csv(unit_occ_in_sect_summary_pay, file="../outputs/data/unit_occupation_in_sector_summary_pay_weekly.csv")# need to identify the unit occs that have an ok nunit_subset <- unit_occ_in_sect_summary_pay %>%group_by(SectorName_labelled,UnitOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10)# create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(SectorName_labelled, UnitOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/unit_occupation_in_sector_weekly_pay_penalty.csv")for(sector in sectors_of_interest){#print(sector)# subset to this sector and drop na occupatoins plot_data <- unit_subset %>%filter(SectorName_labelled == sector) %>%filter(!is.na(UnitOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by N not_outsourced_levels <- plot_data %>%select(UnitOccupation_labelled, outsourcing_status, N) %>%distinct(UnitOccupation_labelled, N) %>%mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original data plot_data <- plot_data %>%mutate(UnitOccupation_labelled =factor(UnitOccupation_labelled, levels =levels(not_outsourced_levels$UnitOccupation_labelled)), ) annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>%select(UnitOccupation_labelled, n) %>%group_by(UnitOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) p <- plot_data %>%ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average weekly income") +ylab("Unit occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle(sector)show(p)ggsave(here('outputs','figures','occupation_pay_plots',paste0('unit_occupation_pay_plot_', sector, '.png')), height =8, width =8, dpi=800, bg="white")}```Many instances where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/unit_occupation_in_sector_weekly_pay_penalty.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/unit_occupation_in_sector_weekly_pay_penalty.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%kable(caption ="Weekly pay penalty for unit occupations within sectors") %>%kable_styling()```### Weekly pay penalty in occupations across all sectors```{r}#| height: 10#| width: 10unit_occ_summary_pay <- data %>%filter(income_drop_all ==0&!is.na(income_weekly_all)) %>%group_by(UnitOccupation_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_weekly_all, na.rm=T),wtd_avg_income =weighted.mean(income_weekly_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(UnitOccupation_labelled) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),UnitOccupation_labelled =case_when(UnitOccupation_labelled =="NA"~NA,TRUE~ UnitOccupation_labelled),UnitOccupation_labelled = stringr::str_to_title(UnitOccupation_labelled) ) %>%ungroup()write_csv(unit_occ_summary_pay, file="../outputs/data/unit_occupation_summary_pay_weekly.csv")# need to identify the unit occs that have an ok n# subste to occs with n>=10unit_subset <- unit_occ_summary_pay %>%group_by(UnitOccupation_labelled) %>%mutate(min_n =min(n, na.rm=TRUE) ) %>%filter(min_n >=10)# create a df with occs where outsourced paid less so we can just list itpaid_less <- unit_subset %>%pivot_wider(id_cols =c(UnitOccupation_labelled), names_from = outsourcing_status, values_from =c(wtd_avg_income, n)) %>% janitor::clean_names() %>%mutate(pay_penalty = wtd_avg_income_outsourced - wtd_avg_income_not_outsourced ) %>%filter( pay_penalty <0 )write_csv(paid_less, file="../outputs/data/unit_occupation_weekly_pay_penalty.csv")#print(sector)# subset to this sector and drop na occupatoinsplot_data <- unit_subset %>%filter(!is.na(UnitOccupation_labelled)) %>%droplevels() %>%ungroup()# Order occs by N# First filter for 'outsourced' level and reorder by Nnot_outsourced_levels <- plot_data %>%select(UnitOccupation_labelled, outsourcing_status, N) %>%distinct(UnitOccupation_labelled, N) %>%mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc =FALSE))# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(UnitOccupation_labelled = forcats::fct_reorder(UnitOccupation_labelled, N, .desc = FALSE))# Then apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(UnitOccupation_labelled =factor(UnitOccupation_labelled, levels =levels(not_outsourced_levels$UnitOccupation_labelled)), )annotation_df <- plot_data %>%#filter(outsourcing_status == "Not outsourced") %>%select(UnitOccupation_labelled, n) %>%group_by(UnitOccupation_labelled) %>%summarise(N =sum(n) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 )p <- plot_data %>%ggplot(., aes(wtd_avg_income, UnitOccupation_labelled, size = perc, colour = outsourcing_status)) +geom_point(position ="dodge") +geom_label_repel(inherit.aes = F, aes(wtd_avg_income, UnitOccupation_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3, #force_pull = 2 ) +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank()) +#coord_flip() +scale_x_continuous(breaks=scales::breaks_pretty(n=5)) +#breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 200)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=UnitOccupation_labelled, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average weekly income") +ylab("Unit occupation") +labs(caption ="Size of bubble represents the size of the respective workforce within the occupation") +ggtitle("All sectors")show(p)ggsave(here('outputs','figures','occupation_pay_plots','unit_occupation_pay_plot.png'), height =8, width =8, dpi=800, bg="white")```Looking at occupations across all sectors, there are many occupations where outsourced workers within a unit occupation are paid less than their non-outsourced counterparts:^[[outputs/data/unit_occupation_weekly_pay_penalty.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/unit_occupation_weekly_pay_penalty.csv)]```{r}paid_less %>%arrange(pay_penalty) %>%kable(caption ="Weekly pay penalty for unit occupations across all sectors") %>%kable_styling()```## London has a disproportionate share of the UK’s outsourced workers, followed by the East and West Midlands::: {.callout-tip title="#regions"}- In London, around 25% of workers are outsourced – the highest proportion of any region in the UK. London is followed by the East Midlands (19%) and West Midlands (18%) in the share of workers in the region who are outsourced, with the East of England being the region with the lowest share of outsourced workers as part of the total employed workforce, at 13%.- Possible addition: Should this include some comment on WHY we think this might be the case? Should we look at sectoral splits in London, compared to everywhere else, to see whether there are significant sector differences that might explain this trend?:::The plot below shows the proportion of workers within each region who are outsourced.[^22][^22]: [outputs/data/region_stats_2.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)```{r}region_statistics_2 <- data %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region, outsourcing_status) %>%summarise(Frequency =sum(NatRepemployees),n =n(), ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) %>%rename(`Outsourcing status`= outsourcing_status ) %>%ungroup()reg_levels <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced") %>%mutate(Region = forcats::fct_reorder(Region, Percentage, .desc=FALSE) )annotation_df <- region_statistics_2 %>%filter(`Outsourcing status`=="Not outsourced") %>%select(Region, N) %>%mutate(ypos =100 )region_statistics_2 %>%mutate(Region =factor(Region, levels =levels(reg_levels$Region)) ) %>%ggplot(., aes(Region, Percentage, fill =`Outsourcing status`)) +geom_col(colour="black") +geom_text(inherit.aes=F, data = annotation_df, aes(Region, ypos, label =paste0("N=",N)), hjust=1, nudge_y =-2) +coord_flip() +scale_fill_manual(values=many_colours) +theme_minimal()readr::write_csv(region_statistics_2, file ="../outputs/data/region_stats_2.csv")region_statistics_2_1 <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced"& Region !="London")london_perc <- region_statistics_2[which(region_statistics_2$Region =="London"& region_statistics_2["Outsourcing status"] =="Outsourced"), "Percentage"]```Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (`r round(region_statistics_2[which(region_statistics_2$Region == "London" & region_statistics_2["Outsourcing status"] == "Outsourced"), "Percentage"],0)`%).```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region.svg')```The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:1. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Percentage"],0)`%)```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region_excl_london.svg')``````{r}region_statistics_3 <- data %>%filter(outsourcing_status =="Outsourced") %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region) %>%summarise(Frequency =sum(NatRepemployees) ) %>%mutate(Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(region_statistics_3, file ="../outputs/data/region_stats_3.csv")```We can also explore how the the entire UK workforce is distributed across the country.[^23] The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK's outsourced workforce is concentrated. The regions with the highest share of the UK's outsourced workforce are:[^23]: [outputs/data/region_stats_3.csv](https://github.com/JustKnowledge-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)1. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Percentage"],0)`%)```{r}region_statistics_3 %>%mutate(Region = haven::as_factor(Region) ) %>%arrange(desc(Percentage)) %>% knitr::kable(.,digits =2) %>%kable_styling(full_width = F)``````{r}knitr::include_graphics('../outputs/figures/outsourcing_distribution_across_regions.svg')```